Did you happened to try this?

        JavaPairRDD<Integer, String> hadoopFile = sc.hadoopFile(
            "/sigmoid", DataInputFormat.class, LongWritable.class,
Text.class)



Thanks
Best Regards

On Tue, Jun 23, 2015 at 6:58 AM, 付雅丹 <yadanfu1...@gmail.com> wrote:

> Hello, everyone! I'm new in spark. I have already written programs in
> Hadoop2.5.2, where I defined my own InputFormat and OutputFormat. Now I
> want to move my codes to spark using java language. The first problem I
> encountered is how to transform big txt file in local storage to RDD, which
> is compatible to my program written in hadoop. I found that there are
> functions in SparkContext which maybe helpful. But I don't know how to use
> them.
> E.G.
>
> public <K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>> RDD 
> <http://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/rdd/RDD.html><scala.Tuple2<K,V>>
>  newAPIHadoopFile(String path,
>                                            Class<F> fClass,
>                                            Class<K> kClass,
>                                            Class<V> vClass,
>                          org.apache.hadoop.conf.Configuration conf)
>
> Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
> and extra configuration options to pass to the input format.
>
> '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable
> object for each record, directly caching the returned RDD or directly
> passing it to an aggregation or shuffle operation will create many
> references to the same object. If you plan to directly cache, sort, or
> aggregate Hadoop writable objects, you should first copy them using a map
>  function.
> In java, the following is wrong.
>
> /////option one
> Configuration confHadoop = new Configuration();
> JavaPairRDD<LongWritable,Text> distFile=sc.newAPIHadoopFile(
> "hdfs://cMaster:9000/wcinput/data.txt",
> DataInputFormat,LongWritable,Text,confHadoop);
>
> /////option two
> Configuration confHadoop = new Configuration();
> DataInputFormat input=new DataInputFormat();
> LongWritable longType=new LongWritable();
> Text text=new Text();
> JavaPairRDD<LongWritable,Text> distFile=sc.newAPIHadoopFile(
> "hdfs://cMaster:9000/wcinput/data.txt",
> input,longType,text,confHadoop);
>
> Can anyone help me? Thank you so much.
>
>

Reply via email to