There is also a lazy implementation: http://erikerlandson.github.io/blog/2014/07/29/deferring-spark-actions-to-lazy-transforms-with-the-promise-rdd/
I generated a PR for it -- there was also an alternate proposal for having it be a library in the new Spark Packages site: http://databricks.com/blog/2014/12/22/announcing-spark-packages.html ----- Original Message ----- > Hi, > maybe the drop function is helpful for you (even though this is probably > more than you need, still interesting read) > http://erikerlandson.github.io/blog/2014/07/27/some-implications-of-supporting-the-scala-drop-method-for-spark-rdds/ > > Joerg > > On Tue, Dec 23, 2014 at 5:45 PM, Hao Ren <inv...@gmail.com> wrote: > > > Hi, > > > > I guess you would like to remove the header of a CSV file. > > > > You can play with partitions. =) > > > > // src is your RDD > > val noHeader = src.mapPartitionsWithIndex( > > (i, iterator) => > > if (i == 0 && iterator.hasNext) { > > iterator.next > > iterator > > } else iterator) > > > > Thus, you don't need to filter on the whole RDD. Good luck. > > > > Hao > > > > > > > > -- > > View this message in context: > > http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20836.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org