Hi Raghav,

Please refer to the following code:

SparkConf sparkConf = new
SparkConf().setMaster("local[2]").setAppName("PersonApp");

//creating java spark context

JavaSparkContext sc = new JavaSparkContext(sparkConf);

//reading file from hfs into spark rdd , the name node is localhost
JavaRDD<String> personStringRDD =
sc.textFile("hdfs://localhost:9000/custom/inputPersonFile.txt");


//Converting from String RDD to Person RDD ...this is just an example,
you can replace the parsing with a better exception handled code

JavaRDD<Person> personObjectRDD = personStringRDD.map(personRow -> {
    String[] personValues = personRow.split("\t");

        return new Person(Long.parseLong(personValues[0]),
personValues[1], personValues[2],
            personValues[3]);
});

//finally just printing the count of objects
System.out.println("Person count = "+personObjectRDD.count());


Regards
Mohit


On Tue, Nov 22, 2016 at 11:17 AM, Raghav <raghavas...@gmail.com> wrote:

> Sorry I forgot to ask how can I use spark context here ? I have hdfs
> directory path of the files, as well as the name node of hdfs cluster.
>
> Thanks for your help.
>
> On Mon, Nov 21, 2016 at 9:45 PM, Raghav <raghavas...@gmail.com> wrote:
>
>> Hi
>>
>> I am extremely new to Spark. I have to read a file form HDFS, and get it
>> in memory  in RDD format.
>>
>> I have a Java class as follows:
>>
>> class Person {
>>     private long UUID;
>>     private String FirstName;
>>     private String LastName;
>>     private String zip;
>>
>>    // public methods
>> }
>>
>> The file in HDFS is as follows:
>>
>> UUID.     FirstName         LastName     Zip
>> 7462       John                 Doll                06903
>> 5231       Brad                 Finley             32820
>>
>>
>> Can someone point me how to get a JavaRDD<Person> object by reading the
>> file in HDFS ?
>>
>> Thanks.
>>
>> --
>> Raghav
>>
>
>
>
> --
> Raghav
>

Reply via email to