Reading file header in Spark

2014-07-16 Thread Silvina Caíno Lores
Hi everyone!

I'm really new to Spark and I'm trying to figure out which would be the
proper way to do the following:

1.- Read a file header (a single line)
2.- Build with it a configuration object
3.- Use that object in a function that will be called by map()

I thought about using filter() after textFile(), but I don't want to get an
RDD as result for I'm expecting a unique object.

Any help is very appreciated.

Thanks in advance,
Silvina


Re: Reading file header in Spark

2014-07-16 Thread Sean Owen
You can rdd.take(1) to get just the header line.

I think someone mentioned before that this is a good use case for
having a tail method on RDDs too, to skip the header for subsequent
processing. But you can ignore it with a filter, or logic in your map
method.

On Wed, Jul 16, 2014 at 11:01 AM, Silvina Caíno Lores
silvi.ca...@gmail.com wrote:
 Hi everyone!

 I'm really new to Spark and I'm trying to figure out which would be the
 proper way to do the following:

 1.- Read a file header (a single line)
 2.- Build with it a configuration object
 3.- Use that object in a function that will be called by map()

 I thought about using filter() after textFile(), but I don't want to get an
 RDD as result for I'm expecting a unique object.

 Any help is very appreciated.

 Thanks in advance,
 Silvina


Re: Reading file header in Spark

2014-07-16 Thread Silvina Caíno Lores
Thank you! This is what I needed, I've read it should work as the first()
method as well. It's a pity that the taken element cannot be removed from
the RDD though.

Thanks again!


On 16 July 2014 12:09, Sean Owen so...@cloudera.com wrote:

 You can rdd.take(1) to get just the header line.

 I think someone mentioned before that this is a good use case for
 having a tail method on RDDs too, to skip the header for subsequent
 processing. But you can ignore it with a filter, or logic in your map
 method.

 On Wed, Jul 16, 2014 at 11:01 AM, Silvina Caíno Lores
 silvi.ca...@gmail.com wrote:
  Hi everyone!
 
  I'm really new to Spark and I'm trying to figure out which would be the
  proper way to do the following:
 
  1.- Read a file header (a single line)
  2.- Build with it a configuration object
  3.- Use that object in a function that will be called by map()
 
  I thought about using filter() after textFile(), but I don't want to get
 an
  RDD as result for I'm expecting a unique object.
 
  Any help is very appreciated.
 
  Thanks in advance,
  Silvina