Re: Merging small files

Steven Zhuang Sat, 19 Jul 2014 16:41:29 -0700

Maybe you should just try HBase.


On Sat, Jul 19, 2014 at 10:57 AM, Shahab Yunus <[email protected]>
wrote:

> It is not advisable to have many small files in hdfs as it can put memory
> load on Namenode as it maintains the metadata, to highlight one major issue.
>
> On the top of my head, some basic ideas...You can either combine invoices
> into a bigger text file containing a collection of records where each
> record is an  invoices or even follow a sequence file format where the id
> could be the invoice id and value/record the invoice details.
>
> Regards,
> Shahab
> On Jul 19, 2014 1:30 PM, "Shashidhar Rao" <[email protected]>
> wrote:
>
>> Hi ,
>>
>> Has anybody worked in retail use case. If my production Hadoop cluster
>> block size is 256 MB but generally if we have to process retail invoice
>> data , each invoice data is merely let's say 4 KB . Do we merge the invoice
>> data to make one large file say 1 GB . What is the best practice in this
>> scenario
>>
>>
>> Regards
>> Shashi
>>
>


-- 
        best wishes.
                Steven

Re: Merging small files

Reply via email to