Thanks a lot, it works!
Do you have any ideas what was the problem? Should this work in fully
distributed mode, or I need to make some modifications? When to use
DistributedCache.addCacheFile and when
DIstributedCache.addLocalCacheFile?
Thanks again!
On Mon 22 Dec 2014 11:03:36 AM CET, unmesha sreeveni wrote:
Driver
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
In setup
public void setup(Context context) throws IOException{
Configuration conf = context.getConfiguration();
FileSystem fs = FileSystem.get(conf);
URI[] cacheFiles = DistributedCache.getCacheFiles(conf);
Path getPath = new Path(cacheFiles[0].getPath());
BufferedReader bf = new BufferedReader(new
InputStreamReader(fs.open(getPath)));
String setupData = null;
while ((setupData = bf.readLine()) != null) {
System.out.println("Setup Line in reducer "+setupData);
}
}
Hope this link helps:
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
On Mon, Dec 22, 2014 at 2:58 PM, Marko Dinic
<[email protected] <mailto:[email protected]>> wrote:
Hello Hadoopers,
I'm getting this exception in Hadoop while trying to read file
that was added to distributed cache, and the strange thing is that
the file exists on the given location
java.io.FileNotFoundException: File does not exist:
/tmp/hadoop-pera/mapred/local/__taskTracker/distcache/-__1517670662102870873_-__1918892372_1898431787/__localhost/work/output/__temporalcentroids/centroids-__iteration0-noOfClusters2/part-__r-00000
I'm adding the file in before starting my job using
DistributedCache.addCacheFile(__URI.create(args[2]),
job.getConfiguration());
And I'm trying to read from the file from setup metod in my mapper
using
DistributedCache.__getLocalCacheFiles(conf);
As I said, I can confirm that the file is on the local system, but
the exception is thrown.
I'm running the job in pseudo-distributed mode, on one computer.
Any ideas?
Thanks
--
*/Thanks & Regards/ *
/*
*/
*Unmesha Sreeveni U.B/
/*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
/
/