Yes makes sense...  cp is serialized and simpler, and does not rely on 
jobtracker- Whereas distcp actually only submits a job and waits for 
completion.  
So it can fail if tasks start to fail or timeout. 
 I Have seen distcp fail and hang before albeit not often.

Sent from my iPhone

On Apr 10, 2013, at 10:37 PM, Alexander Pivovarov <[email protected]> wrote:

> if cluster is busy with other jobs distcp will wait for free map slots. 
> Regular cp is more reliable and predictable. Especialy if you need to copy 
> just several GB
> 
> On Apr 10, 2013 6:31 PM, "Azuryy Yu" <[email protected]> wrote:
>> CP command is not parallel, It's just call FileSystem, even if DFSClient has 
>> multi threads.
>> 
>> DistCp can work well on the same cluster.
>> 
>> 
>> On Thu, Apr 11, 2013 at 8:17 AM, KayVajj <[email protected]> wrote:
>>> The File System Copy utility copies files byte by byte if I'm not wrong. 
>>> Could it be possible that the cp command works with blocks and moves them 
>>> which could be significantly efficient? 
>>> 
>>> 
>>> Also how does the cp command work if the file is distributed on different 
>>> data nodes??
>>> 
>>> Thanks
>>> Kay
>>> 
>>> 
>>> On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas <[email protected]> wrote:
>>>> DistCP is a full blown mapreduce job (mapper only, where the mappers do a 
>>>> "fully" parallel copy to the detsination).  
>>>> 
>>>> CP appears (correct me if im wrong) to simply invoke the FileSystem and 
>>>> issues a copy command for every source file.
>>>> 
>>>> I have an additional question: how is CP which is internal to a cluster 
>>>> optimized (if at all) ? 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 <[email protected]> wrote:
>>>>> Hi,
>>>>>  
>>>>> I think it' better using Copy in the same cluster while using distCP 
>>>>> between clusters, and cp command is a hadoop internal parallel process 
>>>>> and will not copy files locally.
>>>>>  
>>>>> 麦树荣
>>>>>  
>>>>> From: KayVajj
>>>>> Date: 2013-04-11 06:20
>>>>> To: [email protected]
>>>>> Subject: Copy Vs DistCP
>>>>> I have few questions regarding the usage of DistCP for copying files in 
>>>>> the same cluster.
>>>>> 
>>>>> 
>>>>> 1) Which one is better within a  same cluster and what factors (like file 
>>>>> size etc) wouldinfluence the usage of one over te other?
>>>>> 
>>>>> 2) when we run a cp command like below from a  client node of the cluster 
>>>>> (not a data node), How does the cp command work
>>>>>      i) like an MR job
>>>>>     ii) copy files locally and then it copy it back at the new location.
>>>>> 
>>>>> Example of the copy command 
>>>>> 
>>>>> hdfs dfs -cp /<some_location>/file /<new_location>/
>>>>> 
>>>>> Thanks, your responses are appreciated.
>>>>> 
>>>>> -- Kay
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Jay Vyas
>>>> http://jayunit100.blogspot.com

Reply via email to