[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-111766746 @JoshRosen Yes, that's fine. I'll ping @willb about listing silex on spark-packages.org. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson closed the pull request at: https://github.com/apache/spark/pull/1839 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/1839#discussion_r32375677 --- Diff: core/src/main/scala/org/apache/spark/rdd/DropRDDFunctions.scala --- @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rdd + +import scala.reflect.ClassTag + +import org.apache.spark.{SparkContext, Logging, Partition, TaskContext} +import org.apache.spark.{Dependency, NarrowDependency, OneToOneDependency} + +import org.apache.spark.SparkContext.rddToPromiseRDDFunctions + + +private [spark] +class FanInDep[T: ClassTag](rdd: RDD[T]) extends NarrowDependency[T](rdd) { + // Assuming parent RDD type having only one partition + override def getParents(pid: Int) = List(0) +} + + +/** + * Extra functions available on RDDs for providing the RDD analogs of Scala drop, + * dropRight and dropWhile, which return an RDD as a result + */ +class DropRDDFunctions[T : ClassTag](self: RDD[T]) extends Logging with Serializable { + + /** + * Return a new RDD formed by dropping the first (n) elements of the input RDD + */ + def drop(n: Int):RDD[T] = { +if (n = 0) return self + +// locate partition that includes the nth element +val locate = (partitions: Array[Partition], input: RDD[T], ctx: TaskContext) = { + var rem = n + var p = 0 + var np = 0 + while (rem 0p partitions.length) { +np = input.iterator(partitions(p), ctx).length --- End diff -- Is it really lazy? I think computation will happen here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-111742946 Hey @erikerlandson, since I don't think we're going to merge this functionality into core right now, do you mind closing this issue? BTW, it would be cool to list Silex on http://spark-packages.org, since that would put the library in front of a lot more users / eyeballs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-109312395 @AlexNisnevich drop, dropRight and dropWhile are now available on the silex project: http://silex.freevariable.com/latest/api/#com.redhat.et.silex.rdd.drop.DropRDDFunctions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user AlexNisnevich commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-95293840 Have any admins verified this patch? `drop` functionality in RDDs would be very useful to have. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-95299921 @erikerlandson What do you think about releasing this (and maybe #1909) as a library on Maven or http://spark-packages.org? I'm not sure that this is an API that we necessarily want to put in core yet, but if you publish it as a package then folks would be able to use it with their existing Spark deployments without having to upgrade. The interface for users could still be pretty nice: just add an implicit class / object or set of implicit conversions, then have users import that. Spark Packages has a helpful command line tool for creating a project template, which might be a timesaver if you decide to go this route: http://spark-packages.org/package/databricks/spark-package-cmd-tool. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-95311919 Hi @JoshRosen, publishing some of these odds and ends in some form has been on my to-do list for a while. If there's interest, I can bump it up in priority. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user AlexNisnevich commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-95328807 @JoshRosen @erikerlandson That would be great. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-61194329 [Test build #22578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22578/consoleFull) for PR 1839 at commit [`af73e1f`](https://github.com/apache/spark/commit/af73e1f3ffab0909acaebdca154889030f1187f7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-61201640 [Test build #22578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22578/consoleFull) for PR 1839 at commit [`af73e1f`](https://github.com/apache/spark/commit/af73e1f3ffab0909acaebdca154889030f1187f7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FanInDep[T: ClassTag](rdd: RDD[T]) extends NarrowDependency[T](rdd) ` * `class DropRDDFunctions[T : ClassTag](self: RDD[T]) extends Logging with Serializable ` * `class FanOutDep[T: ClassTag](rdd: RDD[T]) extends NarrowDependency[T](rdd) ` * `class PromisePartition extends Partition ` * `class PromiseRDD[V: ClassTag](expr: = (TaskContext = V),` * `class PromiseArgPartition(p: Partition, argv: Seq[PromiseRDD[_]]) extends Partition ` * `class PromiseRDDFunctions[T : ClassTag](self: RDD[T]) extends Logging with Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-61201646 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22578/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-58765937 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21648/consoleFull) for PR 1839 at commit [`af73e1f`](https://github.com/apache/spark/commit/af73e1f3ffab0909acaebdca154889030f1187f7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-58767530 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21648/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-58767527 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21648/consoleFull) for PR 1839 at commit [`af73e1f`](https://github.com/apache/spark/commit/af73e1f3ffab0909acaebdca154889030f1187f7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FanInDep[T: ClassTag](rdd: RDD[T]) extends NarrowDependency[T](rdd) ` * `class DropRDDFunctions[T : ClassTag](self: RDD[T]) extends Logging with Serializable ` * `class FanOutDep[T: ClassTag](rdd: RDD[T]) extends NarrowDependency[T](rdd) ` * `class PromisePartition extends Partition ` * `class PromiseRDD[V: ClassTag](expr: = (TaskContext = V),` * `class PromiseArgPartition(p: Partition, argv: Seq[PromiseRDD[_]]) extends Partition ` * `class PromiseRDDFunctions[T : ClassTag](self: RDD[T]) extends Logging with Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-54694497 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-51806430 Assuming this is correct, okay is not same as ok: The following regex checks that: .*ok\W+to\W+test.* So I think you should be able to use it in a sentence or whatever. https://groups.google.com/forum/#!msg/quicksilver---development/Bn7RPYqAfTI/cQ-_u1BbMEQJ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-51825586 Jenkins, this is ok to test. Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-51720727 Jenkins still not getting the memo. How strict is Jenkins with commands? Is 'okay' same as 'ok'? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson closed the pull request at: https://github.com/apache/spark/pull/1254 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-51496145 This is a reboot of: https://github.com/apache/spark/pull/1254 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-51496120 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user concretevitamin commented on the pull request: https://github.com/apache/spark/pull/1839#issuecomment-51520540 Jenkins, this is okay to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-51142701 Should I consider creating a fresh PR, or is there some better way to get Jenkins to test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-51142915 I'm not sure what's happening. Maybe Jenkins is lazy today. We can retry tomorrow, and if it doesn't work, create a new PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-50966402 Starting to worry I confused it by pushing the PR branch using '+' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-50905932 jenkins appears to be awol --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-50906403 Let me give it a try: Jenkins, this is ok to test. Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-50759649 O Jenkins Where Art Thou? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-50648091 should Jenkins run an automatic build on PR update? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-50673706 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-50554859 I updated this PR so that drop(), dropRight() and dropWhile() are now lazy transforms. A description of what I did is here: http://erikerlandson.github.io/blog/2014/07/29/deferring-spark-actions-to-lazy-transforms-with-the-promise-rdd/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user jayunit100 commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-49602573 Adding the Drop function to a contrib library of functions (which requires manual import) , as erik suggests, seems like a really good option. I could see such a contrib library also being useful for other isoteric but nevertheless important tasks, like dealing with binary data formats, etc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-47419953 Thanks - I can see why this might be useful, but it is a pretty high bar now to add new APIs to the RDD interface, and we need to be very careful about APIs that might have very bad performance behaviors (dropping a large number can be very slow, in particular if it crosses many partitions). For this reason, it might make more sense for this to be an example program or a blog post that's easily indexable so people can find. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-47420008 BTW it is just my personal opinion. Feel free to debate or find support :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-47420288 My reasoning is that most use cases (or at least the ones I had in mind) are something like rdd.drop(n), where n is much smaller than rdd.count(), generally 1 or some other small number. FWIW, I implemented it via an implicit object, so it's not directly on the RDD class per se. Another way to look at it, these functions aren't worse than rdd.take(), as they use similar logic. However, it's true that if (n) is a large fraction of the size of the RDD, then it will invoke computation of a large fraction of the partitions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-47420413 The thing is we must scan data twice to make sure this actually works (because we need to verify the number of partitions we checked is sufficient). Usually users' specific use case can be solved with a very simple workaround despite the lack of RDD.drop (e.g. for csv files with header that you want to drop, you can just drop it at the first partition using an drop within a mapPartitions). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-47426628 It will scan one partition twice: the one containing the boundary between things dropped and not-dropped. Any partitions prior to that boundary are ignored by the resulting RDD (so they are scanned once), and any partitions after the boundary are not examined unless/until the result RDD is evaluated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-47426817 Tangentially, one thing I noticed is that currently all the XxxRDDFunctions implicits are automatically defined in SparkContext, and so I held to that pattern in this PR.However, another option might be to not automatically define it, and a user would import DropRDDFunctions for themselves if they wanted to use drop methods. In fact, that seems like a good pattern generally for reducing unneeded imports; one might say the same thing for OrderedRDDFunctions, etc: import XxxRDDFunctions if you need it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-47428789 Note, in a typical case where one is invoking something like rdd.drop(1), or other small number, only one partition gets evaluated by drop - the first one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-47429672 I also envision typical use cases as being either pre- or post-processing. That is, not something that would often appear inside a tight loop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
GitHub user erikerlandson opened a pull request: https://github.com/apache/spark/pull/1254 [SPARK-2315] Implement drop, dropRight and dropWhile for RDDs drop, dropRight and dropWhile methods for RDDs that return a new RDD as the result. // example: load in some text and skip header lines val txt = sc.textFile(data_with_header.txt) val data = txt.drop(3) You can merge this pull request into a Git repository by running: $ git pull https://github.com/erikerlandson/spark rdd_drop_master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1254.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1254 commit aa3c87984907d26b626dcc1e7c356d642147e840 Author: Erik Erlandson eerla...@redhat.com Date: 2014-06-28T01:06:35Z [SPARK-2315] Implement drop, dropRight and dropWhile for RDDs, which take RDD as input and return new RDD with elements dropped. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-47418701 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---