[GitHub] spark pull request: [yarn]The method has a never used parameter
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1761#issuecomment-51020215 In general, I'm wary of merging this type of very-small-scale code cleanup; I don't think this makes the code any easier to understand and it may cause merge conflicts down the road, creating maintenance hassles for us (see my comments on #1728). Other committers may disagree with me, so feel free to chime in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2638 MapOutputTracker concurrency improv...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1542#issuecomment-51020347 Since this same locking pattern occurs at several places in the code, I think it might make sense to abstract it behind a function or macro, which would give us a centralized place to experiment with different synchronization / locking strategies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1719#issuecomment-51020440 QA results for PR 1719:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass Word2Vec(brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17842/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-51020596 This looks okay, but I still wonder whether there's a simpler approach. Have you looked at how [dill](https://github.com/uqfoundation/dill) handles namedtuples? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2583] ConnectionManager error reporting
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1758#issuecomment-51020658 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-51020851 It's easy to extend pickle to support namedtuple, couldpickle and dill have done in this way, but they are slow. We want to use cPickle for dataset, it should be fast by default. I had not find an way to extend cPickle, do you have any ideas? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2583] ConnectionManager error reporting
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1758#issuecomment-51020991 QA tests have started for PR 1758. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17845/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1719#issuecomment-51021254 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2817] add show create table support
Github user tianyi commented on the pull request: https://github.com/apache/spark/pull/1760#issuecomment-51021277 @chenghao-intel is these files all right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-51021399 Here's another (contrived) example that breaks: ```python from collections import namedtuple as nt from pyspark import SparkContext from pyspark.serializers import PickleSerializer sc = SparkContext(local) p = PickleSerializer() Person = nt(Person, 'id firstName lastName') jon = Person(1, Jon, Doe) sc.textFile(/usr/share/dict/words).map(lambda x: jon).first() ``` It looks like the problem here is that line 306 assumes that old references will be named `namedtuple`, which isn't true if I import it under a different name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1719#issuecomment-51021526 QA tests have started for PR 1719. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17846/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1812] Enable cross build for scala 2.11...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/996#discussion_r15741984 --- Diff: assembly/pom.xml --- @@ -26,7 +26,7 @@ /parent groupIdorg.apache.spark/groupId - artifactIdspark-assembly_2.10/artifactId + artifactIdspark-assembly_${scala.binary.version}/artifactId --- End diff -- Maintaining a separate branch is difficult and requires sometimes non trivial merges to keep it in sync. I don't really know, what an archetype build which generates pom(s) for different scala version mean. I feel we can have some release scripts that can take care of releasing maven poms with expression in artifact replaced by its corresponding constant. This sort of thing sbt does natively. I think this is much better than complicating the build further, because once we have an archetype we will have to adjust sbt a great deal to accommodate it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2638 MapOutputTracker concurrency improv...
Github user javadba commented on the pull request: https://github.com/apache/spark/pull/1542#issuecomment-51022369 Thanks for commenting Josh. I will see about putting together something on this including solid testcases. ETA later in the coming week. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-51022848 Yes, it's easy to break it. Having an solution working in 99% cases is better than no solutions, or much slower solution working 100% cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1719#issuecomment-51023016 QA results for PR 1719:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass Word2Vec(brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17843/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple
Github user davies commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-51023029 This feature is not blocker, because we prefer use Row() instead of namedtuple to do inferSchema(). If user really want to use namedtuple or customized class in __main__, they could use cloudpickle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1484#issuecomment-51023586 @mengxr Could you review or comment this? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1719#issuecomment-51023600 LGTM. Merged into both master and branch-1.1. @Ishiihara Thanks a lot for implementing word2vec! Please help improve its performance during the QA period. One task left is Java support. If you want to spend some time on it, there are some examples in `HashingTF.scala` and `JavaTfIdfSuite.java`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: JIRA issue: [SPARK-1405] Gibbs sampling based ...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/476#discussion_r15742427 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -233,4 +235,60 @@ object MLUtils { } sqDist } + +/** + * Load corpus from a given path. Terms and documents will be translated into integers, with a + * term-integer map and a document-integer map. + * + * @param dir The path of corpus. + * @param dirStopWords The path of stop words. + * @return (RDD[Document], Term-integer map, doc-integer map) + */ + def loadCorpus( + sc: SparkContext, + dir: String, + minSplits: Int, + dirStopWords: String = ): + (RDD[Document], Index[String], Index[String]) = { + +// Containers and indexers for terms and documents +val termMap = Index[String]() +val docMap = Index[String]() + +val stopWords = + if (dirStopWords == ) { +Set.empty[String] + } + else { +sc.textFile(dirStopWords, minSplits). + map(x = x.replaceAll( (?m)\s+$, )).distinct().collect().toSet + } +val broadcastStopWord = sc.broadcast(stopWords) + +// Tokenize and filter terms +val almostData = sc.wholeTextFiles(dir, minSplits).map { case (fileName, content) = + val tokens = JavaWordTokenizer(content) --- End diff -- We should allow users to customize here. We can add a parameter `tokenizer: (String) = Iterable[String]`, to loadCorpus, and `dirStopWords ` is not required. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Remove support for waiting for executors in st...
GitHub user kayousterhout opened a pull request: https://github.com/apache/spark/pull/1762 Remove support for waiting for executors in standalone mode. Current code waits until some minimum fraction of expected executors have registered before beginning scheduling. The current code in standalone mode suffers from a race condition (SPARK-2635). This race condition could be fixed, but this functionality is easily achieved by the user (they can use the storage status to determine how many executors are up, as described by @pwendell in #1462) so adding the extra complexity to the scheduler code may not be worthwhile. This commit removes the functionality in standalone mode but not for YARN -- where it is more necessary and the number of expected executors is well-defined. This PR is a POC; if the powers-that-be determine that this is what we should do, I will file a JIRA. This should be backported into 1.1 if committed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kayousterhout/spark-1 remove_executor_wait Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1762.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1762 commit fa746ed65a8ac685edd33c79159521398d99aa69 Author: Kay Ousterhout kayousterh...@gmail.com Date: 2014-08-04T06:59:05Z Remove support for waiting for executors in standalone mode. Current code waits until some minimum fraction of expected executors have registered before beginning scheduling. The current code in standalone mode suffers from a race condition (SPARK-2635). This race condition could be fixed, but this functionality is easily achieved by the user (they can use the storage status to determine how many executors are up, as described by @pwendell in #1462) so adding the extra complexity to the scheduler code is not worthwile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2555] Support configuration spark.sched...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/1462#issuecomment-51024346 @pwendell I created https://github.com/apache/spark/pull/1762 for your judgment of what the right thing to do here is! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Remove support for waiting for executors in st...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1762#discussion_r15742607 --- Diff: yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import akka.actor.ActorSystem + +import org.apache.spark.SparkContext +import org.apache.spark.scheduler.TaskSchedulerImpl + +/** + * Subclass of CoarseGrainedSchedulerBackend that handles waiting until sufficient resources + * are registered before beginning to schedule tasks (needed by both Yarn scheduler backends). + */ +private[spark] class YarnSchedulerBackend( +scheduler: TaskSchedulerImpl, +actorSystem: ActorSystem) + extends CoarseGrainedSchedulerBackend(scheduler, actorSystem) { + + // Submit tasks only after (registered executors / total expected executors) + // is equal to at least this value (expected to be a double between 0 and 1, inclusive). + var minRegisteredRatio = conf.getDouble(spark.scheduler.minRegisteredExecutorsRatio, 0.8) + if (minRegisteredRatio 1) minRegisteredRatio = 1 + // Regardless of whether the required number of executors have registered, return true from + // isReady() after this amount of time has elapsed. + val maxRegisteredWaitingTime = +conf.getInt(spark.scheduler.maxRegisteredExecutorsWaitingTime, 3) + private val createTime = System.currentTimeMillis() + + var totalExpectedExecutors: Int = _ + + override def isReady(): Boolean = { --- End diff -- Realized I should fix this to only print the log message once -- will do this if we decide to use this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1719 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1484#issuecomment-51024568 Sure. We had some transformers implemented under `mllib.feature`, similar to sk-learn's approach. For feature selection, we can follow the same approach if we view feature selection as transformation: 1) fit a dataset and select a subset of features, 2) transform a dataset by picking out selected features. So for the API, I suggest the following ~~~ class ChiSquaredFeatureSelector(numFeatures: Int) extends Serializable { def fit(dataset: RDD[LabeledPoint]): this.type def transform(dataset: RDD[LabeledPoint]): RDD[LabeledPoint] } ~~~ and we can hide the implementation from public interfaces. Please let me know whether this sounds good to you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: fix GraphX EdgeRDD zipPartitions
GitHub user luluorta opened a pull request: https://github.com/apache/spark/pull/1763 fix GraphX EdgeRDD zipPartitions If the users set âspark.default.parallelismâ and the value is different with the EdgeRDD partition number, GraphX jobs will throw: java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions You can merge this pull request into a Git repository by running: $ git pull https://github.com/luluorta/spark fix-graph-zip Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1763.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1763 commit 83389614959fb2c84b947362af1e0babbfe767d5 Author: luluorta luluo...@gmail.com Date: 2014-08-04T07:03:17Z fix GraphX EdgeRDD zipPartitions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2823]fix GraphX EdgeRDD zipPartitions
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1763#issuecomment-51024735 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2678][Core] Added -- to prevent spark...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1715#issuecomment-51024718 @liancheng why not just have the jars listed on the classpath in the order they are given to us? This is also how classpaths work in general, when I run a java command, I don't give a special flag for the first element in the classpath, I just put it first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2678][Core] Added -- to prevent spark...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1715#issuecomment-51024798 @andrewor14 I still don't understand how this is different. Basically, the JVM works such that you put a set of jars in order (indicating precedence) and then you can refer to a class defined in any of the jars. Why should we differ from the normal JVM semantics and have a special name for the first jar in the class? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1481#discussion_r15742765 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -29,7 +29,7 @@ import akka.actor.{ActorSystem, Cancellable, Props} import sun.nio.ch.DirectBuffer import org.apache.spark._ -import org.apache.spark.executor.{DataReadMethod, InputMetrics} +import org.apache.spark.executor.{ShuffleWriteMetrics, DataReadMethod, InputMetrics} --- End diff -- alphabetization (here and elsewhere as well) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1687] [PySpark] pickable namedtuple
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1623#issuecomment-51024991 I found another technique that may be more robust to `namedtuple` being accessible under different names. We can replace `namedtuple`'s code object at runtime in order to interpose on calls to it: ```python import types def copy_func(f, name=None): # See http://stackoverflow.com/a/6528148/590203 return types.FunctionType(f.func_code, f.func_globals, name or f.func_name, f.func_defaults, f.func_closure) from collections import namedtuple namedtuple._old_namedtuple = copy_func(namedtuple) def wrapped(*args, **kwargs): print Called the wrapped function! return namedtuple._old_namedtuple(*args, **kwargs) namedtuple.func_code = wrapped.func_code print namedtuple(Person, name age) ``` This prints ``` Called the wrapped function! class 'collections.Person' ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLlib] [SPARK-2510]Word2Vec: Distributed Repr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1719#issuecomment-51025157 QA results for PR 1719:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass Word2Vec(brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17846/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2817] add show create table support
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1760#issuecomment-51025307 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51025499 Updated patch addresses @pwendell and @kayousterhout 's comments and adds tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51025769 QA tests have started for PR 1481. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17848/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1812] Enable cross build for scala 2.11...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/996#discussion_r15743162 --- Diff: assembly/pom.xml --- @@ -26,7 +26,7 @@ /parent groupIdorg.apache.spark/groupId - artifactIdspark-assembly_2.10/artifactId + artifactIdspark-assembly_${scala.binary.version}/artifactId --- End diff -- I won't claim to be a maven-archetype-plugin expert, or to have figured out everything that we would need to do to start using archetyped builds (e.g. I haven't got sub-project builds with archetypes figured out yet), but the basics of archetypes are pretty simple to use and do get us a significant part of the way to cross-building Spark. To see how the rudiments work, you should be able to quickly and easily do the following: 1) Clone the spark repo 2) cd to spark/core 3) mvn archetype:create-from-project 4) cd into the new target/generated-sources/archetype 5) mvn install 6) make a tmp dir someplace and cd into it 7) mvn archetype:generate -DarchetypeCatalog=local -DgroupId=org.apache.spark -DartifactId=spark-core_2.11 -Dversion=2.0.0-SNAPSHOT 8) select '1', the only archetype that you now have installed locally 9) observe that you now have a spark-core tree ready to build spark-core_2.11-2.0.0-SNAPSHOT (except for the spark-parent reference -- I told you, I haven't got sub-projects figured out yet.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2555] Support configuration spark.sched...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1462#issuecomment-51026034 Okay let me run it by some more people tomorrow and figure it out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2824/2825][SQL] Work towards separating...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1764#discussion_r15743214 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala --- @@ -266,7 +266,8 @@ package object dsl { object plans { // scalastyle:ignore implicit class DslLogicalPlan(val logicalPlan: LogicalPlan) extends LogicalPlanFunctions { - def writeToFile(path: String) = WriteToFile(path, logicalPlan) +// Writing to files doesn't make sense without Hadoop in scope (unless we assume local fs) +// def writeToFile(path: String) = WriteToFile(path, classOf[??], logicalPlan) --- End diff -- Not sure what to do with this, it doesn't seem to make a lot of sense in the catalyst package unless catalyst wants to depend on Hadoop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2824/2825][SQL] Work towards separating...
GitHub user aarondav opened a pull request: https://github.com/apache/spark/pull/1764 [SPARK-2824/2825][SQL] Work towards separating data location from format Currently, there is a fundamental assumption in SparkSQL that a Parquet table is stored at a certain Hadoop path and that a Metastore table is stored within the Hive warehouse. However, the fact that a table is Parquet or serialized as an object file is independent of where the data is actually located. This patch attempts to work towards creating a cleaner separation between where the data is located and the format the data is in by introducing two concepts: a TableFormat and a TableLocation. This abstraction enables code like the following: ```scala val myTable = // ... myTable.saveAsTable(myTable, classOf[ParquetFormat]) hql(SELECT * FROM myTable).collect // reads from Parquet! // Also allows expansion of file-writing later: myTable.saveAsFile(/my/file, classOf[ParquetFormat]) ``` Additionally, this allows us to trivially support external tables with arbitrary formats. However, this PR doesn't attempt to make any radical changes. Parquet files still only support being written to a single Hadoop directory, but this can be part of a Hive table or a normal directory. The MetastoreRelation still requires living within the Metastore because it relies heavily on the metadata there. The hope of this patch is that it enables the two linked features ([SPARK-2824](https://issues.apache.org/jira/browse/SPARK-2824) and [SPARK-2825](https://issues.apache.org/jira/browse/SPARK-2825)) while adding a useful abstraction for the future. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aarondav/spark hive Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1764.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1764 commit b18a4ae3c9bbe8f917ccc7e9aeb1ece25d54bc46 Author: Aaron Davidson aa...@databricks.com Date: 2014-08-02T19:55:41Z [SPARK-2824/2825][SQL] Work towards separating data location from format Currently, there is a fundamental assumption in SparkSQL that a Parquet table is stored at a certain Hadoop path and that a Metastore table is stored within the Hive warehouse. However, the fact that a table is Parquet or serialized as an object file is independent of where the data is actually located. This patch attempts to work towards creating a cleaner separation between where the data is located and the format the data is in by introducing two concepts: a TableFormat and a TableLocation. This abstraction enables code like the following: ```scala val myTable = // myTable.saveAsTable(myTable, classOf[ParquetFormat]) hql(SELECT * FROM myTable).collect // reads from Parquet! // Also allows expansion of file-writing later: myTable.saveAsFile(/my/file, classOf[ParquetFormat]) ``` Additionally, this allows us to trivially support external tables with arbitrary formats. However, this PR doesn't attempt to make any radical changes. Parquet files still only support being written to a single Hadoop directory, but this can be part of a Hive table or a normal directory. The MetastoreRelation still requires living within the Metastore because it relies heavily on the metadata there. The hope of this patch is that it enables the two linked features ([SPARK-2824](https://issues.apache.org/jira/browse/SPARK-2824) and [SPARK-2825](https://issues.apache.org/jira/browse/SPARK-2825)) while adding a useful abstraction for the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2824/2825][SQL] Work towards separating...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1764#discussion_r15743237 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/TableFormat.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.plans.physical --- End diff -- Not certain what the right place for this class is... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51026321 Cool - thanks Sandy. Let's see if tests pass. Can likely merge this tomorrow and fix any remaining issues (if they exist). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2824/2825][SQL] Work towards separating...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1764#discussion_r15743336 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTypes.scala --- @@ -353,15 +356,14 @@ private[parquet] object ParquetTypesConverter extends Logging { * in the parent directory. If so, this is used. Else we read the actual footer at the given * location. * @param origPath The path at which we expect one (or more) Parquet files. - * @param configuration The Hadoop configuration to use. + * @param conf The Hadoop configuration to use. * @return The `ParquetMetadata` containing among other things the schema. */ - def readMetaData(origPath: Path, configuration: Option[Configuration]): ParquetMetadata = { + def readMetaData(origPath: Path, conf: Configuration): ParquetMetadata = { if (origPath == null) { throw new IllegalArgumentException(Unable to read Parquet metadata: path is null) } val job = new Job() -val conf = configuration.getOrElse(ContextUtil.getConfiguration(job)) --- End diff -- I wanted to get rid of the optional configuration, but perhaps I should put this back. Making a new Job and then asking for a Configuration doesn't seem like it'd be more useful than just constructing a new Configuration, though, at least in terms of the properties set. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2824/2825][SQL] Work towards separating...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1764#issuecomment-51026425 QA tests have started for PR 1764. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17849/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2824/2825][SQL] Work towards separating...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1764#issuecomment-51026471 QA results for PR 1764:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass ParquetFormat(sqlContext: SQLContext, conf: Configuration) extends TableFormat {brclass MetastoreFormat(hiveContext: HiveContext) extends TableFormat {brcase class HiveTableLocation(brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17849/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2824/2825][SQL] Work towards separating...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1764#issuecomment-51026733 QA tests have started for PR 1764. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17850/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2824/2825][SQL] Work towards separating...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1764#issuecomment-51026778 QA results for PR 1764:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass ParquetFormat(sqlContext: SQLContext, conf: Configuration) extends TableFormat {brclass MetastoreFormat(hiveContext: HiveContext) extends TableFormat {brcase class HiveTableLocation(brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17850/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2815]: Compilation failed upon the hado...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1754#issuecomment-51027051 I think this is an intermediate YARN version that is different from both the yarn-alpha and yarn-stable API's. @witgo what if you apply the patch here - does it work? https://github.com/apache/spark/pull/151/files In terms of merging this - I'm not sure we want to support 3 different YARN API's in the build and I think CDH itself said that YARN was not stable/supported here, so I'm not sure if we want to merge that patch. I am curious though whether it fixes it for you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2815]: Compilation failed upon the hado...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1754#issuecomment-51028168 This PR makes sbt' s behavior is consistent with [building-with-maven.md] (https://github.com/apache/spark/blob/master/docs/building-with-maven.md) description. table class=table thead trthYARN version/ththProfile required/th/tr /thead tbody trtd0.23.x to 2.1.x/tdtdyarn-alpha/td/tr trtd2.2.x and later/tdtdyarn/td/tr /tbody /table --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1779] add warning when memoryFraction i...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/714#issuecomment-51029448 hey @andrewor14 ï¼ i can not see FAILED unit tests info, so i do not know how to resolve it. can you help me --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Optionally parallelize the Spark build.
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1752#issuecomment-51030127 @pwendell Ah, so that's what's causing it. Yes, fix forward by all means, but can this be disabled until that time? it looks like about half or more of all test runs are failing spuriously and that just means they have to be run 2-3 times. It's now slower to get to a passed test suite when they really pass. In a way, parallelizing single builds is less prone to this conflict. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [SPARK-2826] Reduce the memory copy whil...
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/1765 [SQL] [SPARK-2826] Reduce the memory copy while building the hashmap for HashOuterJoin This is a follow up for #1147 , this PR will improve the performance about 10% - 15% in my local tests. ``` Before: LeftOuterJoin: took 16750 ms ([300] records) LeftOuterJoin: took 15179 ms ([300] records) RightOuterJoin: took 15515 ms ([300] records) RightOuterJoin: took 15276 ms ([300] records) FullOuterJoin: took 19150 ms ([600] records) FullOuterJoin: took 18935 ms ([600] records) After: LeftOuterJoin: took 15218 ms ([300] records) LeftOuterJoin: took 13503 ms ([300] records) RightOuterJoin: took 13663 ms ([300] records) RightOuterJoin: took 14025 ms ([300] records) FullOuterJoin: took 16624 ms ([600] records) FullOuterJoin: took 16578 ms ([600] records) ``` Besides the performance improvement, I also do some clean up as suggested in #1147 You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark hash_outer_join_fixing Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1765.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1765 commit ab1f9e0cce75b875f2bacefe3972c7fbe6fc898c Author: Cheng Hao hao.ch...@intel.com Date: 2014-08-04T07:51:04Z Reduce the memory copy while building the hashmap --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [SPARK-2826] Reduce the memory copy whil...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1765#issuecomment-51030910 QA tests have started for PR 1765. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17851/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2815]: Compilation failed upon the hado...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1754#discussion_r15745620 --- Diff: project/SparkBuild.scala --- @@ -71,7 +71,7 @@ object SparkBuild extends PomBuild { } Properties.envOrNone(SPARK_HADOOP_VERSION) match { case Some(v) = -if (v.matches(0.23.*)) isAlphaYarn = true +if (^2\\.[2-9]+.r.findFirstIn(v) == None) isAlphaYarn = true --- End diff -- If there is ever a Hadoop 2.10, this pattern would consider it a YARN alpha version. Better to match against `2\\.[01]\\.[0-9]+` or something. This won't address the actual issue in the OP. It looks like a bit of cleanup, but in a deprecated code path. It would let Hadoop 2.0 / 2.1 be supported but they aren't actually. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1986][GraphX]move lib.Analytics to org....
GitHub user larryxiao opened a pull request: https://github.com/apache/spark/pull/1766 [SPARK-1986][GraphX]move lib.Analytics to org.apache.spark.examples to support ~/spark/bin/run-example GraphXAnalytics triangles /soc-LiveJournal1.txt --numEPart=256 You can merge this pull request into a Git repository by running: $ git pull https://github.com/larryxiao/spark 1986 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1766.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1766 commit c6ac09211b886908fbb665e5763fc01f8fa140f3 Author: Larry Xiao xia...@sjtu.edu.cn Date: 2014-08-04T08:57:34Z [SPARK-1986][GraphX]move lib.Analytics to org.apache.spark.examples to support ~/spark/bin/run-example GraphXAnalytics triangles /soc-LiveJournal1.txt --numEPart=256 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1986][GraphX]move lib.Analytics to org....
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1766#issuecomment-51032872 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1484#issuecomment-51033011 @mengxr 1. Do I understand correct, that you propose that `fit(dataset: RDD[LabeledPoint])` should compute feature scores according to the feature selection algorithm and `transform(dataset: RDD[LabeledPoint])` should return the filtered dataset? 2. It seems that such an interface allows misuse when someone calls `transform` before `fit`. In some sense it is similar to calling `predict` before actually learning the model. This is avoided in MLLib classification models implementation by means of `ClassificationModel` interface that has `predict` only. Individual classifier has object that returns its instance (that does training as well). I like this approach more because it is less error-prone from user prospective, but it is a little bit implicit from developer's prospective (you need to know that you need to implement a fabric). Long story short, why not to seal `fit` inside the constructor or inside the object? ``` trait FeatureSelector extends Serializable { def transform(dataset: RDD[LabeledPoint]): RDD[LabeledPoint] } //EITHER class ChiSquaredFeatureSelector(dataset: RDD[LabeledPoint], numFeatures: Int) extends FeatureSelector { // perform chi squared computations... // implement transform override def transform(dataset: RDD[LabeledPoint]): RDD[LabeledPoint] } // OR (like in classification models): class ChiSquaredFeatureSelector extends FeatureSelector { private def fit(dataset: RDD[LabeledPoint]) // implement transform override def transform(dataset: RDD[LabeledPoint]): RDD[LabeledPoint] } object ChiSquaredFeatureSelector{ def fit(dataset: RDD[LabeledPoint], numFeatures: Int) { val chi = new ChiSquaredFeatureSelector chi.fit return chi } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2827][GraphX]Add degree distribution op...
GitHub user luluorta opened a pull request: https://github.com/apache/spark/pull/1767 [SPARK-2827][GraphX]Add degree distribution operators in GraphOps for GraphX You can merge this pull request into a Git repository by running: $ git pull https://github.com/luluorta/spark graphx-degree Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1767.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1767 commit d1e053f3c7a95272676edfad485a31f69290effd Author: luluorta luluo...@gmail.com Date: 2014-08-04T08:42:28Z add max/min degree commit 1c35298bfd3bea5b8eeba6bb4804b3fe74ff7fd9 Author: luluorta luluo...@gmail.com Date: 2014-08-04T08:56:29Z add degree distribution --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2827][GraphX]Add degree distribution op...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1767#issuecomment-51033669 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2815]: Compilation failed upon the hado...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1754#issuecomment-51033818 @pwendell #151 compilation fails. There seems to be infinite loop: ` SPARK_HADOOP_VERSION=2.0.0-cdh4.5.0 SPARK_YARN=true SPARK_HIVE=true sbt/sbt clean assembly` - ``` java.lang.StackOverflowError at scala.reflect.internal.HasFlags$class.isSynthetic(HasFlags.scala:115) at scala.reflect.internal.Symbols$Symbol.isSynthetic(Symbols.scala:112) at xsbt.ExtractUsedNames.eligibleAsUsedName(ExtractUsedNames.scala:121) at xsbt.ExtractUsedNames.handleClassicTreeNode$1(ExtractUsedNames.scala:87) at xsbt.ExtractUsedNames.xsbt$ExtractUsedNames$$handleTreeNode$1(ExtractUsedNames.scala:97) at xsbt.ExtractUsedNames$$anonfun$handleMacroExpansion$1$2.apply(ExtractUsedNames.scala:64) at xsbt.ExtractUsedNames$$anonfun$handleMacroExpansion$1$2.apply(ExtractUsedNames.scala:64) at scala.reflect.internal.Trees$ForeachTreeTraverser.traverse(Trees.scala:1489) at scala.reflect.internal.Trees$ForeachTreeTraverser.traverse(Trees.scala:1487) at scala.reflect.internal.Trees$class.itraverse(Trees.scala:1174) at scala.reflect.internal.SymbolTable.itraverse(SymbolTable.scala:13) at scala.reflect.internal.SymbolTable.itraverse(SymbolTable.scala:13) at scala.reflect.api.Trees$Traverser.traverse(Trees.scala:2825) at scala.reflect.internal.Trees$ForeachTreeTraverser.traverse(Trees.scala:1490) at scala.reflect.internal.Trees$TreeContextApiImpl.foreach(Trees.scala:80) at xsbt.ExtractUsedNames.handleMacroExpansion$1(ExtractUsedNames.scala:64) at xsbt.ExtractUsedNames.xsbt$ExtractUsedNames$$handleTreeNode$1(ExtractUsedNames.scala:95) at xsbt.ExtractUsedNames$$anonfun$handleMacroExpansion$1$2.apply(ExtractUsedNames.scala:64) at xsbt.ExtractUsedNames$$anonfun$handleMacroExpansion$1$2.apply(ExtractUsedNames.scala:64) at scala.reflect.internal.Trees$ForeachTreeTraverser.traverse(Trees.scala:1489) at scala.reflect.internal.Trees$ForeachTreeTraverser.traverse(Trees.scala:1487) at scala.reflect.api.Trees$Traverser.traverseTrees(Trees.scala:2829) at scala.reflect.internal.Trees$class.itraverse(Trees.scala:1174) at scala.reflect.internal.SymbolTable.itraverse(SymbolTable.scala:13) at scala.reflect.internal.SymbolTable.itraverse(SymbolTable.scala:13) at scala.reflect.api.Trees$Traverser.traverse(Trees.scala:2825) at scala.reflect.internal.Trees$ForeachTreeTraverser.traverse(Trees.scala:1490) at scala.reflect.internal.Trees$TreeContextApiImpl.foreach(Trees.scala:80) at xsbt.ExtractUsedNames.handleMacroExpansion$1(ExtractUsedNames.scala:64) at xsbt.ExtractUsedNames.xsbt$ExtractUsedNames$$handleTreeNode$1(ExtractUsedNames.scala:95) at xsbt.ExtractUsedNames$$anonfun$handleMacroExpansion$1$2.apply(ExtractUsedNames.scala:64) at xsbt.ExtractUsedNames$$anonfun$handleMacroExpansion$1$2.apply(ExtractUsedNames.scala:64) at scala.reflect.internal.Trees$ForeachTreeTraverser.traverse(Trees.scala:1489) at scala.reflect.internal.Trees$ForeachTreeTraverser.traverse(Trees.scala:1487) at scala.reflect.api.Trees$Traverser.traverseTrees(Trees.scala:2829) at scala.reflect.internal.Trees$class.itraverse(Trees.scala:1174) at scala.reflect.internal.SymbolTable.itraverse(SymbolTable.scala:13) at scala.reflect.internal.SymbolTable.itraverse(SymbolTable.scala:13) at scala.reflect.api.Trees$Traverser.traverse(Trees.scala:2825) at scala.reflect.internal.Trees$ForeachTreeTraverser.traverse(Trees.scala:1490) at scala.reflect.internal.Trees$TreeContextApiImpl.foreach(Trees.scala:80) at xsbt.ExtractUsedNames.handleMacroExpansion$1(ExtractUsedNames.scala:64) at xsbt.ExtractUsedNames.xsbt$ExtractUsedNames$$handleTreeNode$1(ExtractUsedNames.scala:95) at xsbt.ExtractUsedNames$$anonfun$handleMacroExpansion$1$2.apply(ExtractUsedNames.scala:64) at xsbt.ExtractUsedNames$$anonfun$handleMacroExpansion$1$2.apply(ExtractUsedNames.scala:64) at scala.reflect.internal.Trees$ForeachTreeTraverser.traverse(Trees.scala:1489) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2815]: Compilation failed upon the hado...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1754#issuecomment-51034387 We need to explicitly pointed out that spark does not support the version `2.0.x` and `2.1.x` of yarn ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-51037911 QA tests have started for PR 1616. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17852/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [SPARK-2826] Reduce the memory copy whil...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1765#issuecomment-51039744 QA results for PR 1765:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17851/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-51044768 QA results for PR 1616:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17852/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/1722#discussion_r15750212 --- Diff: core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala --- @@ -35,16 +35,15 @@ private[spark] class JavaSerializationStream(out: OutputStream, counterReset: In /** * Calling reset to avoid memory leak: * http://stackoverflow.com/questions/1281549/memory-leak-traps-in-the-java-standard-api - * But only call it every 10,000th time to avoid bloated serialization streams (when + * But only call it every 100th time to avoid bloated serialization streams (when * the stream 'resets' object class descriptions have to be re-written) */ def writeObject[T: ClassTag](t: T): SerializationStream = { objOut.writeObject(t) +counter += 1 if (counterReset 0 counter = counterReset) { --- End diff -- This is the right behavior, but is a slight change ... I dont think anyone is expecting the earlier behavior though ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2792. Fix reading too much or too little...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/1722#issuecomment-51047651 LGTM ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2815]: Compilation failed upon the hado...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1754#issuecomment-51048750 @witgo I don't think #151 is to be committed, if I understand correctly. It's not 100% clear which versions of YARN 2.0.x actually work with `yarn-alpha`, and which if any work with `yarn`. If anything it's worth a note that the pre-stable YARN versions are not guaranteed to work, but might. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user li-zhihui commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-51051752 @JoshRosen added comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2678][Core] Added -- to prevent spark...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1715#issuecomment-51055752 @pwendell OK, the `java -firstCpElement` example really convinced me :) I used to think asking users to care about the order of the jars is a little too much, but every sane programmer over JVM should care about this anyway. I'll try to deliver a version as you described soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1750#issuecomment-51064528 QA tests have started for PR 1750. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17853/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2686 Add Length and OctetLen support to ...
Github user javadba commented on the pull request: https://github.com/apache/spark/pull/1586#issuecomment-51065193 Hi, For some reason the CORE module testing has ballooned in overall testing time: it took over 7.5 hours to run. There was one timeout error out of 736 tests - and it is quite unlikely to have anything to do with the code added in this PR. Here is the test that failed and then the overall results: DriverSuite: Spark assembly has been built with Hive, including Datanucleus jars on classpath - driver should exit after finishing *** FAILED *** TestFailedDueToTimeoutException was thrown during property evaluation. (DriverSuite.scala:40) Message: The code passed to failAfter did not complete within 60 seconds. Location: (DriverSuite.scala:41) Occurred at table row 0 (zero based, not counting headings), which had values ( master = local ) Tests: succeeded 723, failed 1, canceled 0, ignored 7, pending 0 *** 1 TEST FAILED *** [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ... SUCCESS [ 1.180 s] [INFO] Spark Project Core . FAILURE [ 07:35 h] So I am not presently in a position to run regression tests - given the overall runtime will be doulbe-digit hours. Would someone please run Jenkins on this code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix postfixOps warnings in the test suite
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1323#issuecomment-51068459 QA tests have started for PR 1323. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17854/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix postfixOps warnings in the test suite
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1323#issuecomment-51069490 Related work #1330 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2482: Resolve sbt warnings during build
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1330#issuecomment-51070559 QA tests have started for PR 1330. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17855/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/1269#discussion_r15759585 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/topicmodeling/utils/serialization/TObjectIntHashMapSerializer.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.clustering.topicmodeling.utils.serialization + +import com.esotericsoftware.kryo.io.{Input, Output} +import com.esotericsoftware.kryo.{Kryo, Serializer} +import gnu.trove.map.hash.TObjectIntHashMap --- End diff -- This can be replaced with ` breeze.util.Index` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/1269#discussion_r15759818 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/clustering/topicmodeling/topicmodels/RobustPLSASuite.scala --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.clustering.topicmodeling.topicmodels + +import java.util.Random + +import org.apache.spark.mllib.clustering.topicmodeling.topicmodels.regulaizers.{SymmetricDirichletDocumentOverTopicDistributionRegularizer, SymmetricDirichletTopicRegularizer} + +class RobustPLSASuite extends AbstractTopicModelSuite[RobustDocumentParameters, + RobustGlobalParameters] { + test(feasibility) { +val numberOfTopics = 2 +val numberOfIterations = 10 + +val plsa = new RobustPLSA(sc, + numberOfTopics, + numberOfIterations, + new Random(), + new SymmetricDirichletDocumentOverTopicDistributionRegularizer(0.2f), + new SymmetricDirichletTopicRegularizer(0.2f)) + +testPLSA(plsa) + } + +} --- End diff -- It seems that git need to blank line end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2817] add show create table support
Github user tianyi commented on the pull request: https://github.com/apache/spark/pull/1760#issuecomment-51072215 what's wrong with jenkins? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2583] ConnectionManager cannot distingu...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/1490#issuecomment-51072948 Thanks for your back up @JoshRosen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1750#issuecomment-51075115 QA results for PR 1750:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brcase class Sqrt(child: Expression) extends UnaryExpression {brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17853/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2817] add show create table support
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1760#issuecomment-51075309 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix postfixOps warnings in the test suite
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1323#issuecomment-51076281 QA results for PR 1323:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17854/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2482: Resolve sbt warnings during build
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1330#issuecomment-51078602 QA results for PR 1330:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17855/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-51081881 Thanks for commenting. I now realize that my concern about advisory locking was a little misguided, since only cooperating Spark processes will be coordinating through the lock file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2817] add show create table support
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1760#issuecomment-51083724 LGTM :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2713] Executors of same application in ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1616#issuecomment-51085554 This seems like an alright fix and I'd like to get it into a release, but I'm concerned that this doesn't correctly handle every possible feature of `fetchFile`. For example, there's [some code](https://github.com/li-zhihui/spark/blob/cachefiles/core/src/main/scala/org/apache/spark/util/Utils.scala#L444) in `fetchFile` to automatically decompress `.tar.gz` files. I don't remember why this code was added (or whether it's actually correct, since it seems to assume that files are downloaded into the current working directory), but I'm not sure that `fetchCachedFile` will properly handle that case; it seems like it would only copy the `.tar.gz` file without decompressing it in the executor's directory. We could try to special-case fix this by moving the decompression logic into `fetchCachedFile`, but I'm worried that it will make `fetchFile` even harder to understand. I think that `fetchFile` might be due for a refactoring. Also, do you think we should just replace `fetchFile` with `fetchCachedFile` and keep the uncached version private? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2806] core - upgrade to json4s-jackson ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1702#issuecomment-51085913 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2806] core - upgrade to json4s-jackson ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1702#issuecomment-51086273 QA tests have started for PR 1702. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17856/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1473: Feature selection fo...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1484#issuecomment-51086836 @avulanov I have the same concern about calling `transform` before `fit`. There are two options: 1) throw an error, 2) fit on the same dataset and then transform (fit_transform in sk-learn). But I don't have a strong preference of either one. I want to add another candidate to what you proposed: ~~~ class ChiSquaredFeatureSelection { def fit(dataset: RDD[LabeledPoint], numFeatures: Int): ChiSquaredFeatureSelector } class ChiSquaredFeatureSelector { def transform(dataset: RDD[LabeledPoint]): RDD[LabeledPoint] } ~~~ We can discuss the class hierarchy later since they are not user-facing. A problem with all the candidates here is we cannot apply the same transformation on `RDD[Vector]`, which is required for prediction. I'm thinking about something like the following: ~~~ class ChiSquaredFeatureSelection { def fit[T : Vectorized with Labeled](dataset: RDD[T], numFeatures: Int): ChiSquaredFeatureSelector } class ChiSquaredFeatureSelector { def transform[T : Vectorized](dataset: RDD[T]): RDD[T] } ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2179][SQL] Public API for DataTypes and...
Github user chutium commented on a diff in the pull request: https://github.com/apache/spark/pull/1346#discussion_r15766362 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -89,6 +88,44 @@ class SQLContext(@transient val sparkContext: SparkContext) new SchemaRDD(this, SparkLogicalPlan(ExistingRdd.fromProductRdd(rdd))(self)) /** + * :: DeveloperApi :: + * Creates a [[SchemaRDD]] from an [[RDD]] containing [[Row]]s by applying a schema to this RDD. + * It is important to make sure that the structure of every [[Row]] of the provided RDD matches + * the provided schema. Otherwise, there will be runtime exception. + * Example: + * {{{ + * import org.apache.spark.sql._ + * val sqlContext = new org.apache.spark.sql.SQLContext(sc) + * + * val schema = + *StructType( + * StructField(name, StringType, false) :: + * StructField(age, IntegerType, true) :: Nil) + * --- End diff -- Hi @yhuai , why we need to define schema as a StructType, but not directly as a Seq[StructField]? i tried to build a Seq[StructField] from JDBC metadata in #1612 https://github.com/apache/spark/pull/1612/files#diff-3 (it followed the code of your JsonRDD :) it seems we do not need this StructType anywhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51087364 @rxin @pwendell This PR is ready for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2583] ConnectionManager error reporting
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1758#issuecomment-51087933 Jenkins, retest this please @JoshRosen it appears something timed out or failed during the tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2583] ConnectionManager error reporting
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1758#issuecomment-51088288 QA tests have started for PR 1758. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17857/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2678][Core] Added -- to prevent spark...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1715#issuecomment-51089001 @pwendell How about for python files? What if I have one.py and two.py that reference each other, and I want spark-submit to run the main method of one.py but not two.py. Since we don't specify a class, I'm not sure how we can distinguish between the two main methods unless we impose the requirement that the primary python file must be the first one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2179][SQL] Public API for DataTypes and...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/1346#discussion_r15767384 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -89,6 +88,44 @@ class SQLContext(@transient val sparkContext: SparkContext) new SchemaRDD(this, SparkLogicalPlan(ExistingRdd.fromProductRdd(rdd))(self)) /** + * :: DeveloperApi :: + * Creates a [[SchemaRDD]] from an [[RDD]] containing [[Row]]s by applying a schema to this RDD. + * It is important to make sure that the structure of every [[Row]] of the provided RDD matches + * the provided schema. Otherwise, there will be runtime exception. + * Example: + * {{{ + * import org.apache.spark.sql._ + * val sqlContext = new org.apache.spark.sql.SQLContext(sc) + * + * val schema = + *StructType( + * StructField(name, StringType, false) :: + * StructField(age, IntegerType, true) :: Nil) + * --- End diff -- For the completeness of our data types, we need `StructType` (`Seq[StructField]` is not a data type). For example, if the type of a filed is a struct, we need to have a way to describe that the type of this field is a struct. Also, because a row is basically a struct value, it is natural to use `StructType` to represent a schema. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1309#issuecomment-51089309 QA tests have started for PR 1309. This patch DID NOT merge cleanly! brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17858/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2678][Core] Added -- to prevent spark...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1715#issuecomment-51089761 @andrewor14 I believe Patrick means: 1. For Scala/Java applications, the primary jar should appear as the 1st entry of `--jars` 1. For Python applications, the primary Python file should appear as the 1st entry of `--py-files` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2583] ConnectionManager error reporting
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/1758#discussion_r15768094 --- Diff: core/src/main/scala/org/apache/spark/network/ConnectionManager.scala --- @@ -41,16 +42,26 @@ import org.apache.spark.util.{SystemClock, Utils} private[spark] class ConnectionManager(port: Int, conf: SparkConf, securityManager: SecurityManager) extends Logging { + /** + * Used by sendMessageReliably to track messages being sent. + * @param message the message that was sent + * @param connectionManagerId the connection manager that sent this message + * @param completionHandler callback that's invoked when the send has completed or failed + */ class MessageStatus( val message: Message, val connectionManagerId: ConnectionManagerId, completionHandler: MessageStatus = Unit) { +/** This is non-None if message has been ack'd */ var ackMessage: Option[Message] = None -var attempted = false --- End diff -- You'll notice that I removed a bunch of fields here. `attempted` was never read anywhere, and `acked` implied `ackMessage != None`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2678][Core] Added -- to prevent spark...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1715#issuecomment-51090902 Hm, I see. Even then we still need some kind of separator right? I thought the whole point of handling primary resources differently here (either under `--primary` or `--jars` or `--py-files`) is to provide backwards compatibility in case the user application uses `--`. If we pick a Spark specific enough separator, doesn't this issue go away? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2806] core - upgrade to json4s-jackson ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1702#issuecomment-51091836 QA results for PR 1702:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17856/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2806] core - upgrade to json4s-jackson ...
Github user avati commented on the pull request: https://github.com/apache/spark/pull/1702#issuecomment-51092475 It is not clear how the failure is related to this patch..? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1309#issuecomment-51092478 QA tests have started for PR 1309. This patch DID NOT merge cleanly! brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17859/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2380: Support displaying accumulator val...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1309#discussion_r15769145 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala --- @@ -42,6 +44,13 @@ class TaskInfo( var gettingResultTime: Long = 0 /** + * Intermediate updates to accumulables during this task. Note that it is valid for the same + * accumulable to be updated multiple times in a single task or for two accumulables with the + * same name but different ID's to exist in a task. --- End diff -- super nit: no apostrophe in IDs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org