So, mapreduce.application.classpath was the winner. It's possible that yarn.application.classpath would have worked as well. My main issue was that I was neglecting to include a copy of the XML files in classpath, so my settings weren't being taken, late night epiphany. Passing the value as -Dmapreduce.application.classpath=... on the command line allowed this to take effect and I was fine.
For remote clients, I have copied into a local classpath lib what I need to launch, the jar list output from accumulo classpath, and a set of the XML files needed to set the appropriate client-side mapreduce options to launch properly, including the classpath mentioned above but also the various memory-related settings in YARN/MR2. Thanks for the help Billie! On Sat, Jan 24, 2015 at 7:51 AM, Billie Rinaldi <[email protected]> wrote: > You might have to set yarn.application.classpath in both the client and > the server conf. At least that's what Slider does. > On Jan 23, 2015 10:00 PM, "Marc Reichman" <[email protected]> > wrote: > >> That's correct, I don't really want to have the client have to package up >> every accumulo and zookeeper jar I need in dcache or a fat jar or whatever >> just to run stuff from a remote client when the jars are all there. >> >> I did try yarn.application.classpath, but I didn't spell out the whole >> thing. Next try I will take all those jars and put them in explicitly >> instead of the dir wildcards. I will update how it goes. >> >> On Fri, Jan 23, 2015 at 5:19 PM, Billie Rinaldi <[email protected]> >> wrote: >> >>> You have all the jars your app needs on both the servers and the client >>> (as opposed to wanting Yarn to distribute them)? Then >>> yarn.application.classpath should be what you need. It looks like >>> /etc/hadoop/conf,/some/lib/dir/*,/some/other/lib/dir/* etc. Is that what >>> you're trying? >>> >>> On Fri, Jan 23, 2015 at 1:56 PM, Marc Reichman < >>> [email protected]> wrote: >>> >>>> My apologies if this is covered somewhere, I've done a lot of searching >>>> and come up dry. >>>> >>>> I am migrating a set of applications from Hadoop 1.0.3/Accumulo 1.4.1 >>>> to Hadoop 2.6.0/Accumulo 1.6.1. The applications are launched by my custom >>>> java apps, using the Hadoop Tool/Configured interface setup, not a big >>>> deal. >>>> >>>> To run MR jobs with AccumuloInputFormat/OutputFormat, in 1.0 I could >>>> use tool.sh to launch the programs, which worked great for local on-cluster >>>> launching. I however needed to launch from remote hosts (maybe even Windows >>>> ones), and I would bundle a large lib dir with everything I needed on the >>>> client-side, and fill out HADOOP_CLASSPATH in hadoop-env.sh with everything >>>> I needed (basically copied the output of accumulo classpath). This would >>>> work for remote submissions, or even local ones, but specifically using my >>>> java mains to launch them without any accumulo or hadoop wrapper scripts. >>>> >>>> In YARN MR 2.6 this doesn't seem to work. No matter what I do, I can't >>>> seem to get a normal java app to have the 2.x MR Application Master pick up >>>> the accumulo items in the classpath, and my jobs fail with ClassNotFound >>>> exceptions. tool.sh works just fine, but again, I need to be able to submit >>>> without that environment. >>>> >>>> I have tried (on the cluster): >>>> HADOOP_CLASSPATH in hadoop-env.sh >>>> HADOOP_CLASSPATH from .bashrc >>>> yarn.application.classpath in yarn-site.xml >>>> >>>> I don't mind using tool.sh locally, it's quite nice, but I need a >>>> strategy to have the cluster "setup" so I can just launch java, set my >>>> appropriate hadoop configs for remote fs and yarn hosts, get my accumulo >>>> connections and in/out setup for mapreduce and launch jobs which have >>>> accumulo awareness. >>>> >>>> Any ideas? >>>> >>>> Thanks, >>>> Marc >>>> >>> >>> >>
