Hi Roberto, Ultimately, the info you need is set here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L69
Being a spark newbie, I extended org.apache.spark.rdd.HadoopRDD class as HadoopRDDWithEnv, which takes in an additional parameter (varname) in the constructor, then override the compute() function to return something like """split.getPipeEnvVars.getOrElse(varName, "") + "|" + value.toString()""" as the value. This obviously is less general and makes certain assumptions about the input data. Also you need to write several wrappers in SparkContext, so that you can do something like sc.textFileWithEnv("hdfs path", "mapreduce_map_input_file"). I was hoping to do something like sc.textFile("hdfs_path").pipe("""/usr/bin/awk "{print\"${mapreduce_map_input_file}\",$0}" """). This gives me some weird kyro buffer overflow exception... Haven't got a chance to look into the details yet. -Simon On Fri, Aug 1, 2014 at 7:38 AM, Roberto Torella <roberto.tore...@gmail.com> wrote: > Hi Simon, > > I'm trying to do the same but I'm quite lost. > > How did you do that? (Too direct? :) > > > Thanks and ciao, > r- > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/access-hdfs-file-name-in-map-tp6551p11160.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >