Never mind! I figured it out by saving it as hadoopfile and passing the
codec to it. Thank you!
On Tuesday, May 10, 2016, Ajay Chander wrote:
> Hi, I have a folder temp1 in hdfs which have multiple format files
> test1.txt, test2.avsc (Avro file) in it. Now I want to compress these files
> toget
Hi, I have a folder temp1 in hdfs which have multiple format files
test1.txt, test2.avsc (Avro file) in it. Now I want to compress these files
together and store it under temp2 folder in hdfs. Expecting that temp2
folder will have one file test_compress.gz which has test1.txt and
test2.avsc under i
Hi Deepak,
Thanks for your response. If I am correct, you suggest reading all
of those files into an rdd on the cluster using wholeTextFiles then apply
compression codec on it, save the rdd to another Hadoop cluster?
Thank you,
Ajay
On Tuesday, May 10, 2016, Deepak Sharma wrote:
> Hi Aja
I will try that out. Thank you!
On Tuesday, May 10, 2016, Deepak Sharma wrote:
> Yes that's what I intended to say.
>
> Thanks
> Deepak
> On 10 May 2016 11:47 pm, "Ajay Chander" > wrote:
>
>> Hi Deepak,
>>Thanks for your response. If I am correct, you suggest reading all
>> of those fil
Yes that's what I intended to say.
Thanks
Deepak
On 10 May 2016 11:47 pm, "Ajay Chander" wrote:
> Hi Deepak,
>Thanks for your response. If I am correct, you suggest reading all
> of those files into an rdd on the cluster using wholeTextFiles then apply
> compression codec on it, save the
Hi Ajay
You can look at wholeTextFiles method of rdd[string,string] and then map
each of rdd to saveAsTextFile .
This will serve the purpose .
I don't think if anything default like distcp exists in spark
Thanks
Deepak
On 10 May 2016 11:27 pm, "Ajay Chander" wrote:
> Hi Everyone,
>
> we are pla
Hi Everyone,
we are planning to migrate the data between 2 clusters and I see distcp
doesn't support data compression. Is there any efficient way to compress
the data during the migration ? Can I implement any spark job to do this ?
Thanks.