I think it's clear if you format your function reasonably:

mjpJobOrderRDD.map(line => {
  val tokens = line.split("\t");
  if (tokens.length == 164 && tokens(23) != null) {
    (tokens(23),tokens(7))
  }
})

In some cases the function returns nothing, in some cases a tuple. The
return type is therefore Any. If you just mean to output a result in
some cases and not others, you must use flatMap + Some + None:

mjpJobOrderRDD.flatMap { line =>
  val tokens = line.split("\t")
  if (tokens.length == 164 && tokens(23) != null) {
    Some((tokens(23),tokens(7)))
  } else {
    None
  }
}

On Wed, Jan 28, 2015 at 7:37 PM, Sanjay Subramanian
<sanjaysubraman...@yahoo.com.invalid> wrote:
> hey guys
>
> I am not following why this happens
>
> DATASET
> =======
> Tab separated values (164 columns)
>
> Spark command 1
> ================
> val mjpJobOrderRDD = sc.textFile("/data/cdr/cdr_mjp_joborder_raw")
> val mjpJobOrderColsPairedRDD = mjpJobOrderRDD.map(line => { val tokens =
> line.split("\t");(tokens(23),tokens(7))})
> mjpJobOrderColsPairedRDD: org.apache.spark.rdd.RDD[(String, String)] =
> MappedRDD[18] at map at <console>:14
>
>
> Spark command 2
> ================
> val mjpJobOrderRDD = sc.textFile("/data/cdr/cdr_mjp_joborder_raw")
> scala> val mjpJobOrderColsPairedRDD = mjpJobOrderRDD.map(line => { val
> tokens = line.split("\t"); if (tokens.length == 164 && tokens(23) != null)
> {(tokens(23),tokens(7))} })
> mjpJobOrderColsPairedRDD: org.apache.spark.rdd.RDD[Any] = MappedRDD[19] at
> map at <console>:14
>
>
> In the second case above , why does it say org.apache.spark.rdd.RDD[Any] and
> not org.apache.spark.rdd.RDD[(String, String)]
>
>
> thanks
>
> sanjay
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to