Don't have time to read the thread, but incase it has not been mentioned....

Unleash filecrusher!
https://github.com/edwardcapriolo/filecrush


On Sun, Jul 20, 2014 at 4:47 AM, Kilaru, Sambaiah <
[email protected]> wrote:

>  This is not place to discuss merits or demerits of MapR, Small files
> screw up very badly with Mapr.
> Small files go into one container (to fill up 256MB or what ever container
> size) and with locality most
> Of the mappers go to three datanodes.
>
>  You should be looking into sequence file format.
>
>  Thanks,
> Sam
>
>   From: "M. C. Srivas" <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Sunday, July 20, 2014 at 8:01 AM
> To: "[email protected]" <[email protected]>
> Subject: Re: Merging small files
>
>   You should look at MapR .... a few 100's of billions of small files is
> absolutely no problem. (disc: I work for MapR)
>
>
> On Sat, Jul 19, 2014 at 10:29 AM, Shashidhar Rao <
> [email protected]> wrote:
>
>>   Hi ,
>>
>>  Has anybody worked in retail use case. If my production Hadoop cluster
>> block size is 256 MB but generally if we have to process retail invoice
>> data , each invoice data is merely let's say 4 KB . Do we merge the invoice
>> data to make one large file say 1 GB . What is the best practice in this
>> scenario
>>
>>
>>  Regards
>>  Shashi
>>
>
>

Reply via email to