Re: How to write mapreduce programming in spark by using java on user-defined javaPairRDD?

Feynman Liang Tue, 07 Jul 2015 10:23:28 -0700

Hi MIssie,

In the Java API, you should consider:

   1. RDD.map

<https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html#map(scala.Function1,%20scala.reflect.ClassTag)>
to
   transform the text
   2. RDD.sortBy

<https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html#sortBy(scala.Function1,%20boolean,%20int,%20scala.math.Ordering,%20scala.reflect.ClassTag)>
to
   order by LongWritable
   3. RDD.saveAsTextFile

<https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html#saveAsTextFile(java.lang.String)>
to
   write to HDFS

On Tue, Jul 7, 2015 at 7:18 AM, 付雅丹 <[email protected]> wrote:

> Hi, everyone!
>
> I've got <key,value> pair in form of <LongWritable, Text>, where I used
> the following code:
>
> SparkConf conf = new SparkConf().setAppName("MapReduceFileInput");
> JavaSparkContext sc = new JavaSparkContext(conf);
> Configuration confHadoop = new Configuration();
>
> JavaPairRDD<LongWritable,Text> sourceFile=sc.newAPIHadoopFile(
> "hdfs://cMaster:9000/wcinput/data.txt",
> DataInputFormat.class,LongWritable.class,Text.class,confHadoop);
>
> Now I want to handle the javapairrdd data from <LongWritable, Text> to
> another <LongWritable, Text>, where the Text content is different. After
> that, I want to write Text into hdfs in order of LongWritable value. But I
> don't know how to write mapreduce function in spark using java language.
> Someone can help me?
>
>
> Sincerely,
> Missie.
>

Re: How to write mapreduce programming in spark by using java on user-defined javaPairRDD?

Reply via email to