Maybe har is a choice. http://hadoop.apache.org/docs/r1.1.2/hadoop_archives.html
Ling kun On Friday, March 29, 2013, Ted Dunning wrote: > Putting each document into a separate file is not likely to be a great > thing to do. > > On the other hand, putting them all into one file may not be what you want > either. > > It is probably best to find a middle ground and create files each with > many documents and each a few gigabytes in size. > > > On Fri, Mar 29, 2013 at 1:15 PM, <[email protected] <javascript:_e({}, > 'cvml', '[email protected]');>> wrote: > >> If there r 1 million docs in an enterprse and we need to perform word >> count computation on all the docs what is the first step to be done. Is it >> to extract all the text of all the docs into a single file and then put >> into hdfs or put each one separately in hdfs. >> Thanks >> >> Sent from BlackBerry® on Airtel > > > -- http://www.lingcc.com
