Reading file header in Spark
Hi everyone! I'm really new to Spark and I'm trying to figure out which would be the proper way to do the following: 1.- Read a file header (a single line) 2.- Build with it a configuration object 3.- Use that object in a function that will be called by map() I thought about using filter() after textFile(), but I don't want to get an RDD as result for I'm expecting a unique object. Any help is very appreciated. Thanks in advance, Silvina
Re: Reading file header in Spark
You can rdd.take(1) to get just the header line. I think someone mentioned before that this is a good use case for having a tail method on RDDs too, to skip the header for subsequent processing. But you can ignore it with a filter, or logic in your map method. On Wed, Jul 16, 2014 at 11:01 AM, Silvina Caíno Lores silvi.ca...@gmail.com wrote: Hi everyone! I'm really new to Spark and I'm trying to figure out which would be the proper way to do the following: 1.- Read a file header (a single line) 2.- Build with it a configuration object 3.- Use that object in a function that will be called by map() I thought about using filter() after textFile(), but I don't want to get an RDD as result for I'm expecting a unique object. Any help is very appreciated. Thanks in advance, Silvina
Re: Reading file header in Spark
Thank you! This is what I needed, I've read it should work as the first() method as well. It's a pity that the taken element cannot be removed from the RDD though. Thanks again! On 16 July 2014 12:09, Sean Owen so...@cloudera.com wrote: You can rdd.take(1) to get just the header line. I think someone mentioned before that this is a good use case for having a tail method on RDDs too, to skip the header for subsequent processing. But you can ignore it with a filter, or logic in your map method. On Wed, Jul 16, 2014 at 11:01 AM, Silvina Caíno Lores silvi.ca...@gmail.com wrote: Hi everyone! I'm really new to Spark and I'm trying to figure out which would be the proper way to do the following: 1.- Read a file header (a single line) 2.- Build with it a configuration object 3.- Use that object in a function that will be called by map() I thought about using filter() after textFile(), but I don't want to get an RDD as result for I'm expecting a unique object. Any help is very appreciated. Thanks in advance, Silvina