yes, you are right.
On Thu, Apr 11, 2013 at 3:40 PM, Hemanth Yamijala <[email protected] > wrote: > AFAIK, the cp command works fully from the DFS client. It reads bytes from > the InputStream created when the file is opened and writes the same to the > OutputStream of the file. It does not work at the level of data blocks. A > configuration io.file.buffer.size is used as the size of the buffer used in > copy - set to 4096 by default. > > Thanks > Hemanth > > > On Thu, Apr 11, 2013 at 9:42 AM, KayVajj <[email protected]> wrote: > >> If CP command is not parallel how does it work for a file partitioned on >> various data nodes? >> >> >> On Wed, Apr 10, 2013 at 6:30 PM, Azuryy Yu <[email protected]> wrote: >> >>> CP command is not parallel, It's just call FileSystem, even if DFSClient >>> has multi threads. >>> >>> DistCp can work well on the same cluster. >>> >>> >>> On Thu, Apr 11, 2013 at 8:17 AM, KayVajj <[email protected]> wrote: >>> >>>> The File System Copy utility copies files byte by byte if I'm not >>>> wrong. Could it be possible that the cp command works with blocks and moves >>>> them which could be significantly efficient? >>>> >>>> >>>> Also how does the cp command work if the file is distributed on >>>> different data nodes?? >>>> >>>> Thanks >>>> Kay >>>> >>>> >>>> On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas <[email protected]> wrote: >>>> >>>>> DistCP is a full blown mapreduce job (mapper only, where the mappers >>>>> do a "fully" parallel copy to the detsination). >>>>> >>>>> CP appears (correct me if im wrong) to simply invoke the FileSystem >>>>> and issues a copy command for every source file. >>>>> >>>>> I have an additional question: how is CP which is internal to a >>>>> cluster optimized (if at all) ? >>>>> >>>>> >>>>> >>>>> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 <[email protected]> wrote: >>>>> >>>>>> ** >>>>>> Hi, >>>>>> >>>>>> I think it' better using Copy in the same cluster while using distCP >>>>>> between clusters, and cp command is a hadoop internal parallel process >>>>>> and >>>>>> will not copy files locally. >>>>>> >>>>>> ------------------------------ >>>>>> 麦树荣 >>>>>> >>>>>> *From:* KayVajj <[email protected]> >>>>>> *Date:* 2013-04-11 06:20 >>>>>> *To:* [email protected] >>>>>> *Subject:* Copy Vs DistCP >>>>>> I have few questions regarding the usage of DistCP for copying >>>>>> files in the same cluster. >>>>>> >>>>>> >>>>>> 1) Which one is better within a same cluster and what factors (like >>>>>> file size etc) wouldinfluence the usage of one over te other? >>>>>> >>>>>> 2) when we run a cp command like below from a client node of the >>>>>> cluster (not a data node), How does the cp command work >>>>>> i) like an MR job >>>>>> ii) copy files locally and then it copy it back at the new >>>>>> location. >>>>>> >>>>>> Example of the copy command >>>>>> >>>>>> hdfs dfs -cp /<some_location>/file /<new_location>/ >>>>>> >>>>>> Thanks, your responses are appreciated. >>>>>> >>>>>> -- Kay >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Jay Vyas >>>>> http://jayunit100.blogspot.com >>>>> >>>> >>>> >>> >> >
