Re: removing first record from RDD[String]
Hi, maybe the drop function is helpful for you (even though this is probably more than you need, still interesting read) http://erikerlandson.github.io/blog/2014/07/27/some-implications-of-supporting-the-scala-drop-method-for-spark-rdds/ Joerg On Tue, Dec 23, 2014 at 5:45 PM, Hao Ren inv...@gmail.com wrote: Hi, I guess you would like to remove the header of a CSV file. You can play with partitions. =) // src is your RDD val noHeader = src.mapPartitionsWithIndex( (i, iterator) = if (i == 0 iterator.hasNext) { iterator.next iterator } else iterator) Thus, you don't need to filter on the whole RDD. Good luck. Hao -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20836.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: removing first record from RDD[String]
There is also a lazy implementation: http://erikerlandson.github.io/blog/2014/07/29/deferring-spark-actions-to-lazy-transforms-with-the-promise-rdd/ I generated a PR for it -- there was also an alternate proposal for having it be a library in the new Spark Packages site: http://databricks.com/blog/2014/12/22/announcing-spark-packages.html - Original Message - Hi, maybe the drop function is helpful for you (even though this is probably more than you need, still interesting read) http://erikerlandson.github.io/blog/2014/07/27/some-implications-of-supporting-the-scala-drop-method-for-spark-rdds/ Joerg On Tue, Dec 23, 2014 at 5:45 PM, Hao Ren inv...@gmail.com wrote: Hi, I guess you would like to remove the header of a CSV file. You can play with partitions. =) // src is your RDD val noHeader = src.mapPartitionsWithIndex( (i, iterator) = if (i == 0 iterator.hasNext) { iterator.next iterator } else iterator) Thus, you don't need to filter on the whole RDD. Good luck. Hao -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20836.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: removing first record from RDD[String]
Hafiz, You can probably use the RDD.mapPartitionsWithIndex method. Mike On Tue, Dec 23, 2014 at 8:35 AM, Hafiz Mujadid [via Apache Spark User List] ml-node+s1001560n20834...@n3.nabble.com wrote: hi dears! Is there some efficient way to drop first line of an RDD[String]? any suggestion? Thanks -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834.html To unsubscribe from Apache Spark User List, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=bXEwMDFrQGdtYWlsLmNvbXwxfDgxMTQwOTE5Nw== . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20838.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: removing first record from RDD[String]
yep Michael Quinlan,it's working as suggested by Hoe Ren thansk to you and Hoe Ren -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20840.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org