Re: Merging small files

Shahab Yunus Sat, 19 Jul 2014 10:58:28 -0700

It is not advisable to have many small files in hdfs as it can put memory
load on Namenode as it maintains the metadata, to highlight one major issue.

On the top of my head, some basic ideas...You can either combine invoices
into a bigger text file containing a collection of records where each
record is an  invoices or even follow a sequence file format where the id
could be the invoice id and value/record the invoice details.

Regards,
Shahab
On Jul 19, 2014 1:30 PM, "Shashidhar Rao" <[email protected]>
wrote:

> Hi ,
>
> Has anybody worked in retail use case. If my production Hadoop cluster
> block size is 256 MB but generally if we have to process retail invoice
> data , each invoice data is merely let's say 4 KB . Do we merge the invoice
> data to make one large file say 1 GB . What is the best practice in this
> scenario
>
>
> Regards
> Shashi
>

Re: Merging small files

Reply via email to