Maybe you should just try HBase.
On Sat, Jul 19, 2014 at 10:57 AM, Shahab Yunus <[email protected]> wrote: > It is not advisable to have many small files in hdfs as it can put memory > load on Namenode as it maintains the metadata, to highlight one major issue. > > On the top of my head, some basic ideas...You can either combine invoices > into a bigger text file containing a collection of records where each > record is an invoices or even follow a sequence file format where the id > could be the invoice id and value/record the invoice details. > > Regards, > Shahab > On Jul 19, 2014 1:30 PM, "Shashidhar Rao" <[email protected]> > wrote: > >> Hi , >> >> Has anybody worked in retail use case. If my production Hadoop cluster >> block size is 256 MB but generally if we have to process retail invoice >> data , each invoice data is merely let's say 4 KB . Do we merge the invoice >> data to make one large file say 1 GB . What is the best practice in this >> scenario >> >> >> Regards >> Shashi >> > -- best wishes. Steven
