You should look at MapR .... a few 100's of billions of small files is absolutely no problem. (disc: I work for MapR)
On Sat, Jul 19, 2014 at 10:29 AM, Shashidhar Rao <[email protected] > wrote: > Hi , > > Has anybody worked in retail use case. If my production Hadoop cluster > block size is 256 MB but generally if we have to process retail invoice > data , each invoice data is merely let's say 4 KB . Do we merge the invoice > data to make one large file say 1 GB . What is the best practice in this > scenario > > > Regards > Shashi >
