removing first record from RDD[String]

2014-12-23 Thread Hafiz Mujadid
hi dears!

Is there some efficient way to drop first line of an RDD[String]?

any suggestion?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: removing first record from RDD[String]

2014-12-23 Thread Jörg Schad
Hi,
maybe the drop function is helpful for you (even though this is probably
more than you need, still interesting read)
http://erikerlandson.github.io/blog/2014/07/27/some-implications-of-supporting-the-scala-drop-method-for-spark-rdds/

Joerg

On Tue, Dec 23, 2014 at 5:45 PM, Hao Ren inv...@gmail.com wrote:

 Hi,

 I guess you would like to remove the header of a CSV file.

 You can play with partitions. =)

 // src is your RDD
 val noHeader = src.mapPartitionsWithIndex(
 (i, iterator) =
 if (i == 0  iterator.hasNext) {
iterator.next
iterator
 } else iterator)

 Thus, you don't need to filter on the whole RDD. Good luck.

 Hao



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20836.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: removing first record from RDD[String]

2014-12-23 Thread Erik Erlandson

There is also a lazy implementation:
http://erikerlandson.github.io/blog/2014/07/29/deferring-spark-actions-to-lazy-transforms-with-the-promise-rdd/

I generated a PR for it -- there was also an alternate proposal for having it 
be a library in the new Spark Packages site:
http://databricks.com/blog/2014/12/22/announcing-spark-packages.html



- Original Message -
 Hi,
 maybe the drop function is helpful for you (even though this is probably
 more than you need, still interesting read)
 http://erikerlandson.github.io/blog/2014/07/27/some-implications-of-supporting-the-scala-drop-method-for-spark-rdds/
 
 Joerg
 
 On Tue, Dec 23, 2014 at 5:45 PM, Hao Ren inv...@gmail.com wrote:
 
  Hi,
 
  I guess you would like to remove the header of a CSV file.
 
  You can play with partitions. =)
 
  // src is your RDD
  val noHeader = src.mapPartitionsWithIndex(
  (i, iterator) =
  if (i == 0  iterator.hasNext) {
 iterator.next
 iterator
  } else iterator)
 
  Thus, you don't need to filter on the whole RDD. Good luck.
 
  Hao
 
 
 
  --
  View this message in context:
  http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20836.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: removing first record from RDD[String]

2014-12-23 Thread Michael Quinlan
Hafiz,

You can probably use the RDD.mapPartitionsWithIndex method.

Mike

On Tue, Dec 23, 2014 at 8:35 AM, Hafiz Mujadid [via Apache Spark User List]
ml-node+s1001560n20834...@n3.nabble.com wrote:

 hi dears!

 Is there some efficient way to drop first line of an RDD[String]?

 any suggestion?

 Thanks

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834.html
  To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=bXEwMDFrQGdtYWlsLmNvbXwxfDgxMTQwOTE5Nw==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20838.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: removing first record from RDD[String]

2014-12-23 Thread Hafiz Mujadid
yep Michael Quinlan,it's working as suggested by Hoe Ren

thansk to you and Hoe Ren 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20840.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org