Re: Cluster Migration

2016-05-10 Thread Ajay Chander
Never mind! I figured it out by saving it as hadoopfile and passing the
codec to it. Thank you!

On Tuesday, May 10, 2016, Ajay Chander  wrote:

> Hi, I have a folder temp1 in hdfs which have multiple format files
> test1.txt, test2.avsc (Avro file) in it. Now I want to compress these files
> together and store it under temp2 folder in hdfs. Expecting that temp2
> folder will have one file test_compress.gz which has test1.txt and
> test2.avsc under it. Is there any possible/effiencient way to achieve this?
>
> Thanks,
> Aj
>
> On Tuesday, May 10, 2016, Ajay Chander  > wrote:
>
>> I will try that out. Thank you!
>>
>> On Tuesday, May 10, 2016, Deepak Sharma  wrote:
>>
>>> Yes that's what I intended to say.
>>>
>>> Thanks
>>> Deepak
>>> On 10 May 2016 11:47 pm, "Ajay Chander"  wrote:
>>>
 Hi Deepak,
Thanks for your response. If I am correct, you suggest reading
 all of those files into an rdd on the cluster using wholeTextFiles then
 apply compression codec on it, save the rdd to another Hadoop cluster?

 Thank you,
 Ajay

 On Tuesday, May 10, 2016, Deepak Sharma  wrote:

> Hi Ajay
> You can look at wholeTextFiles method of rdd[string,string] and then
> map each of rdd  to saveAsTextFile .
> This will serve the purpose .
> I don't think if anything default like distcp exists in spark
>
> Thanks
> Deepak
> On 10 May 2016 11:27 pm, "Ajay Chander"  wrote:
>
>> Hi Everyone,
>>
>> we are planning to migrate the data between 2 clusters and I see
>> distcp doesn't support data compression. Is there any efficient way to
>> compress the data during the migration ? Can I implement any spark job to
>> do this ? Thanks.
>>
>


Re: Cluster Migration

2016-05-10 Thread Ajay Chander
Hi, I have a folder temp1 in hdfs which have multiple format files
test1.txt, test2.avsc (Avro file) in it. Now I want to compress these files
together and store it under temp2 folder in hdfs. Expecting that temp2
folder will have one file test_compress.gz which has test1.txt and
test2.avsc under it. Is there any possible/effiencient way to achieve this?

Thanks,
Aj

On Tuesday, May 10, 2016, Ajay Chander  wrote:

> I will try that out. Thank you!
>
> On Tuesday, May 10, 2016, Deepak Sharma  > wrote:
>
>> Yes that's what I intended to say.
>>
>> Thanks
>> Deepak
>> On 10 May 2016 11:47 pm, "Ajay Chander"  wrote:
>>
>>> Hi Deepak,
>>>Thanks for your response. If I am correct, you suggest reading
>>> all of those files into an rdd on the cluster using wholeTextFiles then
>>> apply compression codec on it, save the rdd to another Hadoop cluster?
>>>
>>> Thank you,
>>> Ajay
>>>
>>> On Tuesday, May 10, 2016, Deepak Sharma  wrote:
>>>
 Hi Ajay
 You can look at wholeTextFiles method of rdd[string,string] and then
 map each of rdd  to saveAsTextFile .
 This will serve the purpose .
 I don't think if anything default like distcp exists in spark

 Thanks
 Deepak
 On 10 May 2016 11:27 pm, "Ajay Chander"  wrote:

> Hi Everyone,
>
> we are planning to migrate the data between 2 clusters and I see
> distcp doesn't support data compression. Is there any efficient way to
> compress the data during the migration ? Can I implement any spark job to
> do this ? Thanks.
>



Re: Cluster Migration

2016-05-10 Thread Ajay Chander
Hi Deepak,
   Thanks for your response. If I am correct, you suggest reading all
of those files into an rdd on the cluster using wholeTextFiles then apply
compression codec on it, save the rdd to another Hadoop cluster?

Thank you,
Ajay

On Tuesday, May 10, 2016, Deepak Sharma  wrote:

> Hi Ajay
> You can look at wholeTextFiles method of rdd[string,string] and then map
> each of rdd  to saveAsTextFile .
> This will serve the purpose .
> I don't think if anything default like distcp exists in spark
>
> Thanks
> Deepak
> On 10 May 2016 11:27 pm, "Ajay Chander"  > wrote:
>
>> Hi Everyone,
>>
>> we are planning to migrate the data between 2 clusters and I see distcp
>> doesn't support data compression. Is there any efficient way to compress
>> the data during the migration ? Can I implement any spark job to do this ?
>>  Thanks.
>>
>


Re: Cluster Migration

2016-05-10 Thread Ajay Chander
I will try that out. Thank you!

On Tuesday, May 10, 2016, Deepak Sharma  wrote:

> Yes that's what I intended to say.
>
> Thanks
> Deepak
> On 10 May 2016 11:47 pm, "Ajay Chander"  > wrote:
>
>> Hi Deepak,
>>Thanks for your response. If I am correct, you suggest reading all
>> of those files into an rdd on the cluster using wholeTextFiles then apply
>> compression codec on it, save the rdd to another Hadoop cluster?
>>
>> Thank you,
>> Ajay
>>
>> On Tuesday, May 10, 2016, Deepak Sharma > > wrote:
>>
>>> Hi Ajay
>>> You can look at wholeTextFiles method of rdd[string,string] and then map
>>> each of rdd  to saveAsTextFile .
>>> This will serve the purpose .
>>> I don't think if anything default like distcp exists in spark
>>>
>>> Thanks
>>> Deepak
>>> On 10 May 2016 11:27 pm, "Ajay Chander"  wrote:
>>>
 Hi Everyone,

 we are planning to migrate the data between 2 clusters and I see distcp
 doesn't support data compression. Is there any efficient way to compress
 the data during the migration ? Can I implement any spark job to do this ?
  Thanks.

>>>


Re: Cluster Migration

2016-05-10 Thread Deepak Sharma
Yes that's what I intended to say.

Thanks
Deepak
On 10 May 2016 11:47 pm, "Ajay Chander"  wrote:

> Hi Deepak,
>Thanks for your response. If I am correct, you suggest reading all
> of those files into an rdd on the cluster using wholeTextFiles then apply
> compression codec on it, save the rdd to another Hadoop cluster?
>
> Thank you,
> Ajay
>
> On Tuesday, May 10, 2016, Deepak Sharma  wrote:
>
>> Hi Ajay
>> You can look at wholeTextFiles method of rdd[string,string] and then map
>> each of rdd  to saveAsTextFile .
>> This will serve the purpose .
>> I don't think if anything default like distcp exists in spark
>>
>> Thanks
>> Deepak
>> On 10 May 2016 11:27 pm, "Ajay Chander"  wrote:
>>
>>> Hi Everyone,
>>>
>>> we are planning to migrate the data between 2 clusters and I see distcp
>>> doesn't support data compression. Is there any efficient way to compress
>>> the data during the migration ? Can I implement any spark job to do this ?
>>>  Thanks.
>>>
>>


Re: Cluster Migration

2016-05-10 Thread Deepak Sharma
Hi Ajay
You can look at wholeTextFiles method of rdd[string,string] and then map
each of rdd  to saveAsTextFile .
This will serve the purpose .
I don't think if anything default like distcp exists in spark

Thanks
Deepak
On 10 May 2016 11:27 pm, "Ajay Chander"  wrote:

> Hi Everyone,
>
> we are planning to migrate the data between 2 clusters and I see distcp
> doesn't support data compression. Is there any efficient way to compress
> the data during the migration ? Can I implement any spark job to do this ?
>  Thanks.
>


Cluster Migration

2016-05-10 Thread Ajay Chander
Hi Everyone,

we are planning to migrate the data between 2 clusters and I see distcp
doesn't support data compression. Is there any efficient way to compress
the data during the migration ? Can I implement any spark job to do this ?
 Thanks.