I am trying to divide my HDFS file into 2 parts/files 80% and 20% for classification algorithm(80% for modelling and 20% for prediction) Please provide suggestion for the same. To take 80% and 20% to 2 seperate files we need to know the exact number of record in the data set And it is only known if we go through the data set once. so we need to write 1 MapReduce Job for just counting the number of records and 2 nd Mapreduce Job for separating 80% and 20% into 2 files using Multiple Inputs.
Am I in the right track or there is any alternative for the same. But again a small confusion how to check if the reducer get filled with 80% data. -- *Thanks & Regards * *Unmesha Sreeveni U.B* *Hadoop, Bigdata Developer* *Centre for Cyber Security | Amrita Vishwa Vidyapeetham* http://www.unmeshasreeveni.blogspot.in/
