Re: Copying to DistributedCache using -files

Josh Wills Thu, 19 Dec 2013 11:49:26 -0800

Hey Miguel,

You need to call:


ToolRunner.run(new MaxmindCrunchJob(), args, new Configuration());

in main() to pickup the args from the commandline.

J


On Thu, Dec 19, 2013 at 8:42 AM, Miguel Paraz <[email protected]> wrote:

> Hi,
> I'm studying Crunch with code that relies on the DistributedCache to copy
> files to the local filesystem. (My code is at
> https://bitbucket.org/mparaz/maxmind-crunch)
>
> I'm using 0.9.0-mapreduce2 on a 2.2.0 setup (Hortonworks Sandbox 2.0).
>
> I see that Crunch programs use the same pattern as low-level MapReduce,
> with ToolRunner.run() and implementing Tool.run().
>
> Unfortunately, the file I specify with the "-files" parameter is not
> copied.
> I logged getConf().get("tmpfiles") and that configuration entry is there.
>
> At which point should the file copied? I looked through the Hadoop source
> code and found that tmpfiles is processed in
> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java
> - copyAndConfigureFiles()
>
> Is this code not invoked when Crunch is used?
> This works with the equivalent MapReduce 2.2.0 API code.
>
> Is there are a working example with distributed files that I could try?
>
> Thanks!
> Miguel
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Copying to DistributedCache using -files

Reply via email to