Thanks Sean. that works and I started the join of this mappedRDD to another one
I have.I have to internalize the use of Map versus FlatMap. Thinking Map Reduce
Java Hadoop code often blinds me :-)
From: Sean Owen <[email protected]>
To: Sanjay Subramanian <[email protected]>
Cc: Cheng Lian <[email protected]>; Jorge Lopez-Malla
<[email protected]>; "[email protected]" <[email protected]>
Sent: Wednesday, January 28, 2015 11:44 AM
Subject: Re: MappedRDD signature
I think it's clear if you format your function reasonably:
mjpJobOrderRDD.map(line => {
val tokens = line.split("\t");
if (tokens.length == 164 && tokens(23) != null) {
(tokens(23),tokens(7))
}
})
In some cases the function returns nothing, in some cases a tuple. The
return type is therefore Any. If you just mean to output a result in
some cases and not others, you must use flatMap + Some + None:
mjpJobOrderRDD.flatMap { line =>
val tokens = line.split("\t")
if (tokens.length == 164 && tokens(23) != null) {
Some((tokens(23),tokens(7)))
} else {
None
}
}
On Wed, Jan 28, 2015 at 7:37 PM, Sanjay Subramanian
<[email protected]> wrote:
> hey guys
>
> I am not following why this happens
>
> DATASET
> =======
> Tab separated values (164 columns)
>
> Spark command 1
> ================
> val mjpJobOrderRDD = sc.textFile("/data/cdr/cdr_mjp_joborder_raw")
> val mjpJobOrderColsPairedRDD = mjpJobOrderRDD.map(line => { val tokens =
> line.split("\t");(tokens(23),tokens(7))})
> mjpJobOrderColsPairedRDD: org.apache.spark.rdd.RDD[(String, String)] =
> MappedRDD[18] at map at <console>:14
>
>
> Spark command 2
> ================
> val mjpJobOrderRDD = sc.textFile("/data/cdr/cdr_mjp_joborder_raw")
> scala> val mjpJobOrderColsPairedRDD = mjpJobOrderRDD.map(line => { val
> tokens = line.split("\t"); if (tokens.length == 164 && tokens(23) != null)
> {(tokens(23),tokens(7))} })
> mjpJobOrderColsPairedRDD: org.apache.spark.rdd.RDD[Any] = MappedRDD[19] at
> map at <console>:14
>
>
> In the second case above , why does it say org.apache.spark.rdd.RDD[Any] and
> not org.apache.spark.rdd.RDD[(String, String)]
>
>
> thanks
>
> sanjay
>