Ok so JobContext.getCacheFiles() retures URI[]. Let's say I only stored one folder in the cache that has several .txt files within it. How do I use that returned URI to read each line of those .txt files?
Basically, how do I read my cached file(s) after I call JobContext.getCacheFiles()? Thanks, Andrew From: Omkar Joshi [mailto:[email protected]] Sent: Wednesday, July 10, 2013 5:15 PM To: [email protected] Subject: Re: Distributed Cache try JobContext.getCacheFiles() Thanks, Omkar Joshi Hortonworks Inc.<http://www.hortonworks.com> On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <[email protected]<mailto:[email protected]>> wrote: Ok using job.addCacheFile() seems to compile correctly. However, how do I then access the cached file in my Mapper code? Is there a method that will look for any files in the cache? Thanks, Andrew From: Ted Yu [mailto:[email protected]<mailto:[email protected]>] Sent: Tuesday, July 09, 2013 6:08 PM To: [email protected]<mailto:[email protected]> Subject: Re: Distributed Cache You should use Job#addCacheFile() Cheers On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <[email protected]<mailto:[email protected]>> wrote: Hi, I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5). In my driver class, I use this code to try and add a file to the distributed cache: import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; Configuration conf = new Configuration(); DistributedCache.addCacheFile(new URI("file path in HDFS"), conf); Job job = Job.getInstance(); ... However, I keep getting warnings that the method addCacheFile() is deprecated. Is there a more current way to add files to the distributed cache? Thanks in advance, Andrew
