Hi Raghav, Please refer to the following code:
SparkConf sparkConf = new SparkConf().setMaster("local[2]").setAppName("PersonApp"); //creating java spark context JavaSparkContext sc = new JavaSparkContext(sparkConf); //reading file from hfs into spark rdd , the name node is localhost JavaRDD<String> personStringRDD = sc.textFile("hdfs://localhost:9000/custom/inputPersonFile.txt"); //Converting from String RDD to Person RDD ...this is just an example, you can replace the parsing with a better exception handled code JavaRDD<Person> personObjectRDD = personStringRDD.map(personRow -> { String[] personValues = personRow.split("\t"); return new Person(Long.parseLong(personValues[0]), personValues[1], personValues[2], personValues[3]); }); //finally just printing the count of objects System.out.println("Person count = "+personObjectRDD.count()); Regards Mohit On Tue, Nov 22, 2016 at 11:17 AM, Raghav <raghavas...@gmail.com> wrote: > Sorry I forgot to ask how can I use spark context here ? I have hdfs > directory path of the files, as well as the name node of hdfs cluster. > > Thanks for your help. > > On Mon, Nov 21, 2016 at 9:45 PM, Raghav <raghavas...@gmail.com> wrote: > >> Hi >> >> I am extremely new to Spark. I have to read a file form HDFS, and get it >> in memory in RDD format. >> >> I have a Java class as follows: >> >> class Person { >> private long UUID; >> private String FirstName; >> private String LastName; >> private String zip; >> >> // public methods >> } >> >> The file in HDFS is as follows: >> >> UUID. FirstName LastName Zip >> 7462 John Doll 06903 >> 5231 Brad Finley 32820 >> >> >> Can someone point me how to get a JavaRDD<Person> object by reading the >> file in HDFS ? >> >> Thanks. >> >> -- >> Raghav >> > > > > -- > Raghav >