Hey Miguel, You need to call:
ToolRunner.run(new MaxmindCrunchJob(), args, new Configuration()); in main() to pickup the args from the commandline. J On Thu, Dec 19, 2013 at 8:42 AM, Miguel Paraz <[email protected]> wrote: > Hi, > I'm studying Crunch with code that relies on the DistributedCache to copy > files to the local filesystem. (My code is at > https://bitbucket.org/mparaz/maxmind-crunch) > > I'm using 0.9.0-mapreduce2 on a 2.2.0 setup (Hortonworks Sandbox 2.0). > > I see that Crunch programs use the same pattern as low-level MapReduce, > with ToolRunner.run() and implementing Tool.run(). > > Unfortunately, the file I specify with the "-files" parameter is not > copied. > I logged getConf().get("tmpfiles") and that configuration entry is there. > > At which point should the file copied? I looked through the Hadoop source > code and found that tmpfiles is processed in > ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java > - copyAndConfigureFiles() > > Is this code not invoked when Crunch is used? > This works with the equivalent MapReduce 2.2.0 API code. > > Is there are a working example with distributed files that I could try? > > Thanks! > Miguel > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
