Just as Harsh pointed out, as long as the underline DFS provides all the required API of DFS for Hadoop, DistCP should work. One thing is that all the required library (including any conf files) needs to be in the classpath, if they are not available in the runtime cluster. Same as S3 file system works fine in DistCP, our project copied TBs data between CFS (Cassandra DFS) and HDFS, when we migrated the data from DFS to HDFS, using Distcp. Yong
> From: [email protected] > Date: Tue, 4 Feb 2014 19:03:15 +0530 > Subject: Re: DistCP : Is it gauranteed to work for any two uri schemes? > To: [email protected] > > Overall the whole DistCp utility is devoid of any HDFS specific items, > but does have some (mostly skippable) checks pertaining to FS level > features such as permissions, checksums, etc.. It should and does work > with any valid URI scheme that the libraries understand to be valid > FSes today. > > On Tue, Feb 4, 2014 at 8:03 AM, Jay Vyas <[email protected]> wrote: > > Hi folks: > > > > I've been thinking about the AWS S3DistCP class and am wondering : is distcp > > built to work between any two hadoop file system classes ? > > > > Or is it implicitly built mainly to work to copy between to HDFS file > > sytems. > > > > I've havent found many examples online with different URI schemes. > > > > With emerging HDFS alternatives, I'd be interested in ways to otimize IO > > between different filesystems using distcp. > > > > -- > > Jay Vyas > > http://jayunit100.blogspot.com > > > > -- > Harsh J
