Thanks a lot, it works!

Do you have any ideas what was the problem? Should this work in fully distributed mode, or I need to make some modifications? When to use DistributedCache.addCacheFile and when DIstributedCache.addLocalCacheFile?

Thanks again!

On Mon 22 Dec 2014 11:03:36 AM CET, unmesha sreeveni wrote:
Driver

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path cachefile = new Path("path/to/file");
FileStatus[] list = fs.globStatus(cachefile);
for (FileStatus status : list) {
  DistributedCache.addCacheFile(status.getPath().toUri(), conf);
}
In setup
public void setup(Context context) throws IOException{
  Configuration conf = context.getConfiguration();
  FileSystem fs = FileSystem.get(conf);
  URI[] cacheFiles = DistributedCache.getCacheFiles(conf);
  Path getPath = new Path(cacheFiles[0].getPath());
  BufferedReader bf = new BufferedReader(new 
InputStreamReader(fs.open(getPath)));
  String setupData = null;
  while ((setupData = bf.readLine()) != null) {
    System.out.println("Setup Line in reducer "+setupData);
  }
}

Hope this link helps:
http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html

On Mon, Dec 22, 2014 at 2:58 PM, Marko Dinic
<[email protected] <mailto:[email protected]>> wrote:

    Hello Hadoopers,

    I'm getting this exception in Hadoop while trying to read file
    that was added to distributed cache, and the strange thing is that
    the file exists on the given location

        java.io.FileNotFoundException: File does not exist:
    
/tmp/hadoop-pera/mapred/local/__taskTracker/distcache/-__1517670662102870873_-__1918892372_1898431787/__localhost/work/output/__temporalcentroids/centroids-__iteration0-noOfClusters2/part-__r-00000

    I'm adding the file in before starting my job using

        DistributedCache.addCacheFile(__URI.create(args[2]),
    job.getConfiguration());

    And I'm trying to read from the file from setup metod in my mapper
    using

        DistributedCache.__getLocalCacheFiles(conf);

    As I said, I can confirm that the file is on the local system, but
    the exception is thrown.

    I'm running the job in pseudo-distributed mode, on one computer.

    Any ideas?

    Thanks




--
*/Thanks & Regards/ *
/*
*/
*Unmesha Sreeveni U.B/
/*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

/
/

Reply via email to