Try Cascading multitool: http://docs.cascading.org/multitool/2.6/
- André On Fri, Dec 12, 2014 at 10:30 AM, unmesha sreeveni <[email protected]> wrote: > I am trying to divide my HDFS file into 2 parts/files > 80% and 20% for classification algorithm(80% for modelling and 20% for > prediction) > Please provide suggestion for the same. > To take 80% and 20% to 2 seperate files we need to know the exact number > of record in the data set > And it is only known if we go through the data set once. > so we need to write 1 MapReduce Job for just counting the number of > records and > 2 nd Mapreduce Job for separating 80% and 20% into 2 files using Multiple > Inputs. > > > Am I in the right track or there is any alternative for the same. > But again a small confusion how to check if the reducer get filled with > 80% data. > > > -- > *Thanks & Regards * > > > *Unmesha Sreeveni U.B* > *Hadoop, Bigdata Developer* > *Centre for Cyber Security | Amrita Vishwa Vidyapeetham* > http://www.unmeshasreeveni.blogspot.in/ > > > -- André Kelpe [email protected] http://concurrentinc.com
