I will try that out. Thank you! On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote:
> Yes that's what I intended to say. > > Thanks > Deepak > On 10 May 2016 11:47 pm, "Ajay Chander" <itsche...@gmail.com > <javascript:_e(%7B%7D,'cvml','itsche...@gmail.com');>> wrote: > >> Hi Deepak, >> Thanks for your response. If I am correct, you suggest reading all >> of those files into an rdd on the cluster using wholeTextFiles then apply >> compression codec on it, save the rdd to another Hadoop cluster? >> >> Thank you, >> Ajay >> >> On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com >> <javascript:_e(%7B%7D,'cvml','deepakmc...@gmail.com');>> wrote: >> >>> Hi Ajay >>> You can look at wholeTextFiles method of rdd[string,string] and then map >>> each of rdd to saveAsTextFile . >>> This will serve the purpose . >>> I don't think if anything default like distcp exists in spark >>> >>> Thanks >>> Deepak >>> On 10 May 2016 11:27 pm, "Ajay Chander" <itsche...@gmail.com> wrote: >>> >>>> Hi Everyone, >>>> >>>> we are planning to migrate the data between 2 clusters and I see distcp >>>> doesn't support data compression. Is there any efficient way to compress >>>> the data during the migration ? Can I implement any spark job to do this ? >>>> Thanks. >>>> >>>