Hi MIssie, In the Java API, you should consider:
1. RDD.map <https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html#map(scala.Function1,%20scala.reflect.ClassTag)> to transform the text 2. RDD.sortBy <https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html#sortBy(scala.Function1,%20boolean,%20int,%20scala.math.Ordering,%20scala.reflect.ClassTag)> to order by LongWritable 3. RDD.saveAsTextFile <https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html#saveAsTextFile(java.lang.String)> to write to HDFS On Tue, Jul 7, 2015 at 7:18 AM, 付雅丹 <[email protected]> wrote: > Hi, everyone! > > I've got <key,value> pair in form of <LongWritable, Text>, where I used > the following code: > > SparkConf conf = new SparkConf().setAppName("MapReduceFileInput"); > JavaSparkContext sc = new JavaSparkContext(conf); > Configuration confHadoop = new Configuration(); > > JavaPairRDD<LongWritable,Text> sourceFile=sc.newAPIHadoopFile( > "hdfs://cMaster:9000/wcinput/data.txt", > DataInputFormat.class,LongWritable.class,Text.class,confHadoop); > > Now I want to handle the javapairrdd data from <LongWritable, Text> to > another <LongWritable, Text>, where the Text content is different. After > that, I want to write Text into hdfs in order of LongWritable value. But I > don't know how to write mapreduce function in spark using java language. > Someone can help me? > > > Sincerely, > Missie. >
