This is not place to discuss merits or demerits of MapR, Small files screw up 
very badly with Mapr.
Small files go into one container (to fill up 256MB or what ever container 
size) and with locality most
Of the mappers go to three datanodes.

You should be looking into sequence file format.

Thanks,
Sam

From: "M. C. Srivas" <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Sunday, July 20, 2014 at 8:01 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Merging small files

You should look at MapR .... a few 100's of billions of small files is 
absolutely no problem. (disc: I work for MapR)


On Sat, Jul 19, 2014 at 10:29 AM, Shashidhar Rao 
<[email protected]<mailto:[email protected]>> wrote:
Hi ,

Has anybody worked in retail use case. If my production Hadoop cluster block 
size is 256 MB but generally if we have to process retail invoice data , each 
invoice data is merely let's say 4 KB . Do we merge the invoice data to make 
one large file say 1 GB . What is the best practice in this scenario


Regards
Shashi

Reply via email to