2016 7:24 PM
To: Piyush Mukati <piyush.muk...@gmail.com <mailto:piyush.muk...@gmail.com>>;
user@hadoop.apache.org <mailto:user@hadoop.apache.org>
Subject: Re: merging small files in HDFS
Hi,
if I correctly understand your request you need only to merge s
ticular directory to single file ..
>>
>> hadoop fs -getmerge
>>
>> --Senthil
>> -Original Message-
>> From: Giovanni Mascari [mailto:giovanni.masc...@polito.it]
>> Sent: Thursday, November 03, 2016 7:24 PM
>> To: Piyush Mukati <piyush.muk...@gmail.com
rectory to single file ..
>
> hadoop fs -getmerge
>
> --Senthil
> -Original Message-
> From: Giovanni Mascari [mailto:giovanni.masc...@polito.it]
> Sent: Thursday, November 03, 2016 7:24 PM
> To: Piyush Mukati <piyush.muk...@gmail.com>; user@hadoop.apache.org
>
Hi ,
You need to write a map method to just parse input file and pass it to
reducer.. use only reducer..so that all maps output will go to one reducer
and one file gets created,which is merge of input files..
On 03-Nov-2016 8:54 pm, "Piyush Mukati" wrote:
> Hi,
> I
al Message-
>
> From: Giovanni Mascari [mailto:giovanni.masc...@polito.it]
>
> Sent: Thursday, November 03, 2016 7:24 PM
>
> To: Piyush Mukati <piyush.muk...@gmail.com>; user@hadoop.apache.org
>
> Subject: Re: merging small files in HDFS
>
>
>
> Hi,
>
> if I c
<piyush.muk...@gmail.com>; user@hadoop.apache.org
Subject: Re: merging small files in HDFS
Hi,
if I correctly understand your request you need only to merge some data
resulting from an hdfs write operation.
In this case, I suppose that your best option is to use hadoop-stream with
'cat' c
Hi,
if I correctly understand your request you need only to merge some data
resulting from an hdfs write operation.
In this case, I suppose that your best option is to use hadoop-stream
with 'cat' command.
take a look here:
https://hadoop.apache.org/docs/r1.2.1/streaming.html
Regards
Il
Hi,
I want to merge multiple files in one HDFS dir to one file. I am planning
to write a map only job using input format which will create only one
inputSplit per dir.
this way my job don't need to do any shuffle/sort.(only read and write back
to disk)
Is there any such file format already