If there r 1 million docs in an enterprse and we need to perform word count computation on all the docs what is the first step to be done. Is it to extract all the text of all the docs into a single file and then put into hdfs or put each one separately in hdfs. Thanks
Sent from BlackBerry® on Airtel
