That was a hidden shameless plug Ted ;-)

The main disadvantage of fs -cp is that all data has to transit via the
machine you issue the command on, depending on the size of data you want to
copy that can be a killer. DistCp is distributed as its name imply, so no
bottleneck of this kind then.
On Apr 14, 2013 6:15 AM, "Ted Dunning" <[email protected]> wrote:

>
> Lance,
>
> Never say never.
>
> Linux programs can read from the right kind of Hadoop cluster without
> using FUSE.
>
>
>
>
> On Fri, Apr 12, 2013 at 10:15 AM, Lance Norskog <[email protected]> wrote:
>
>>  Shell 'cp' only works if you use 'fuse', which makes the HDFS file
>> system visible as a Unix mounted file system. Otherwise, Unix programs
>> cannot read or write HDFS files.
>>
>> On 04/11/2013 09:52 AM, KayVajj wrote:
>>
>>    Summing up what would be the recommendations for copy
>>
>>  1) DistCP
>>  2) shell cp command
>>  3) Using File System API(FileUtils to be precise) inside of a Java
>> program
>>  4) A MR with an Identity Mapper and no Reducer (may be this is what
>> DistCP does)
>>
>>
>>  I did not run any comparisons as my dev cluster is just a two node
>> cluster and not sure how this would perform on a production cluster.
>>
>>  Kay
>>
>>
>> On Thu, Apr 11, 2013 at 5:44 AM, Jay Vyas <[email protected]> wrote:
>>
>>>  Yes makes sense...  cp is serialized and simpler, and does not rely on
>>> jobtracker- Whereas distcp actually only submits a job and waits for
>>> completion.
>>> So it can fail if tasks start to fail or timeout.
>>>  I Have seen distcp fail and hang before albeit not often.
>>>
>>> Sent from my iPhone
>>>
>>> On Apr 10, 2013, at 10:37 PM, Alexander Pivovarov <[email protected]>
>>> wrote:
>>>
>>>   if cluster is busy with other jobs distcp will wait for free map
>>> slots. Regular cp is more reliable and predictable. Especialy if you need
>>> to copy just several GB
>>> On Apr 10, 2013 6:31 PM, "Azuryy Yu" <[email protected]> wrote:
>>>
>>>>  CP command is not parallel, It's just call FileSystem, even if
>>>> DFSClient has multi threads.
>>>>
>>>>  DistCp can work well on the same cluster.
>>>>
>>>>
>>>>  On Thu, Apr 11, 2013 at 8:17 AM, KayVajj <[email protected]>wrote:
>>>>
>>>>>  The File System Copy utility copies files byte by byte if I'm not
>>>>> wrong. Could it be possible that the cp command works with blocks and 
>>>>> moves
>>>>> them which could be significantly efficient?
>>>>>
>>>>>
>>>>>  Also how does the cp command work if the file is distributed on
>>>>> different data nodes??
>>>>>
>>>>>  Thanks
>>>>>  Kay
>>>>>
>>>>>
>>>>> On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas <[email protected]>wrote:
>>>>>
>>>>>>  DistCP is a full blown mapreduce job (mapper only, where the
>>>>>> mappers do a "fully" parallel copy to the detsination).
>>>>>>
>>>>>> CP appears (correct me if im wrong) to simply invoke the FileSystem
>>>>>> and issues a copy command for every source file.
>>>>>>
>>>>>>  I have an additional question: how is CP which is internal to a
>>>>>> cluster optimized (if at all) ?
>>>>>>
>>>>>>
>>>>>>
>>>>>>  On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 <[email protected]> wrote:
>>>>>>
>>>>>>>  Hi,
>>>>>>>
>>>>>>> I think it' better using Copy in the same cluster while using distCP
>>>>>>> between clusters, and cp command is a hadoop internal parallel process 
>>>>>>> and
>>>>>>> will not copy files locally.
>>>>>>>
>>>>>>> ------------------------------
>>>>>>>  麦树荣
>>>>>>>
>>>>>>>  *From:* KayVajj <[email protected]>
>>>>>>> *Date:* 2013-04-11 06:20
>>>>>>> *To:* [email protected]
>>>>>>> *Subject:* Copy Vs DistCP
>>>>>>>        I have few questions regarding the usage of DistCP for
>>>>>>> copying files in the same cluster.
>>>>>>>
>>>>>>>
>>>>>>> 1) Which one is better within a  same cluster and what factors (like
>>>>>>> file size etc) wouldinfluence the usage of one over te other?
>>>>>>>
>>>>>>>  2) when we run a cp command like below from a  client node of the
>>>>>>> cluster (not a data node), How does the cp command work
>>>>>>>       i) like an MR job
>>>>>>>      ii) copy files locally and then it copy it back at the new
>>>>>>> location.
>>>>>>>
>>>>>>>  Example of the copy command
>>>>>>>
>>>>>>>  hdfs dfs -cp /<some_location>/file /<new_location>/
>>>>>>>
>>>>>>>  Thanks, your responses are appreciated.
>>>>>>>
>>>>>>>  -- Kay
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Jay Vyas
>>>>>> http://jayunit100.blogspot.com
>>>>>>
>>>>>
>>>>>
>>>>
>>
>>
>

Reply via email to