Re: removing first record from RDD[String]

Erik Erlandson Tue, 23 Dec 2014 09:30:59 -0800

There is also a lazy implementation:
http://erikerlandson.github.io/blog/2014/07/29/deferring-spark-actions-to-lazy-transforms-with-the-promise-rdd/


I generated a PR for it -- there was also an alternate proposal for having it 
be a library in the new Spark Packages site:
http://databricks.com/blog/2014/12/22/announcing-spark-packages.html



----- Original Message -----
> Hi,
> maybe the drop function is helpful for you (even though this is probably
> more than you need, still interesting read)
> http://erikerlandson.github.io/blog/2014/07/27/some-implications-of-supporting-the-scala-drop-method-for-spark-rdds/
> 
> Joerg
> 
> On Tue, Dec 23, 2014 at 5:45 PM, Hao Ren <inv...@gmail.com> wrote:
> 
> > Hi,
> >
> > I guess you would like to remove the header of a CSV file.
> >
> > You can play with partitions. =)
> >
> > // src is your RDD
> > val noHeader = src.mapPartitionsWithIndex(
> > (i, iterator) =>
> >     if (i == 0 && iterator.hasNext) {
> >        iterator.next
> >        iterator
> >     } else iterator)
> >
> > Thus, you don't need to filter on the whole RDD. Good luck.
> >
> > Hao
> >
> >
> >
> > --
> > View this message in context:
> > http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20836.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: removing first record from RDD[String]

Reply via email to