Spot on - thanks Billie, that did the trick!
On Thu, Jan 17, 2013 at 1:57 PM, Billie Rinaldi <[email protected]> wrote: > On Thu, Jan 17, 2013 at 11:16 AM, Mike Hugo <[email protected]> wrote: > >> Thanks Billie! >> >> Setting "mapred.job.tracker" and "fs.default.name" in the conf has >> gotten me further. >> >> job.getConfiguration().set("mapred.job.tracker", >> "server_name_here:8021"); >> job.getConfiguration().set("fs.default.name", >> "hdfs://server_name_here:8020"); >> >> What's interesting now is that the job can't find Accumulo classes - when >> I run the job now, I get >> >> 2013-01-17 12:59:25,278 [main] INFO mapred.JobClient - Task Id : >> attempt_201301171102_0012_m_000000_1, Status : FAILED >> java.lang.RuntimeException: java.lang.ClassNotFoundException: >> org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat >> >> Is there a way to inform the job (via the Job API, on a separate machine >> not running hadoop) about extra libs to include on the classpath of the job? >> > > You normally inform a job about jars it needs by specifying "-libjars > comma,separated,jar,list" on the command line. In this case, you need to > put those two strings "-libjars" and "jar,list" in the String[] args passed > to ToolRunner.run: > ToolRunner.run(CachedConfiguration.getInstance(), new ...(), args) > > The accumulo-core jar probably isn't the only one you'll need. > > Billie > > >> >> Thanks >> >> Mike >> >> >> >> On Wed, Jan 16, 2013 at 3:11 PM, Billie Rinaldi <[email protected]>wrote: >> >>> Your job is running in "local" mode (Running job: job_local_0001). This >>> basically means that the hadoop configuration is not present on the >>> classpath of the java client kicking off the job. If you weren't planning >>> to have the hadoop config on that machine, you might be able to get away >>> with setting "mapred.job.tracker" and probably also "fs.default.name" >>> on the Configuration object. >>> >>> Billie >>> >>> >>> >>> On Wed, Jan 16, 2013 at 12:07 PM, Mike Hugo <[email protected]> wrote: >>> >>>> Cool, thanks for the feedback John, the examples have been helpful in >>>> getting up and running! >>>> >>>> Perhaps I'm not doing something quite right. When I jar up my jobs and >>>> deploy the jar to the server and run it via the tool.sh command on the >>>> cluster, I see the job running in the jobtracker (servername:50030) and it >>>> runs as I would expect. >>>> >>>> 13/01/16 14:39:53 INFO mapred.JobClient: Running job: >>>> job_201301161326_0006 >>>> 13/01/16 14:39:54 INFO mapred.JobClient: map 0% reduce 0% >>>> 13/01/16 14:41:29 INFO mapred.JobClient: map 50% reduce 0% >>>> 13/01/16 14:41:35 INFO mapred.JobClient: map 100% reduce 0% >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Job complete: >>>> job_201301161326_0006 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Counters: 18 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Job Counters >>>> 13/01/16 14:41:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=180309 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Total time spent by all >>>> reduces waiting after reserving slots (ms)=0 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Total time spent by all >>>> maps waiting after reserving slots (ms)=0 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Rack-local map tasks=2 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Launched map tasks=2 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: File Output Format Counters >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Bytes Written=0 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: FileSystemCounters >>>> 13/01/16 14:41:40 INFO mapred.JobClient: HDFS_BYTES_READ=248 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=60214 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: File Input Format Counters >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Bytes Read=0 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Map-Reduce Framework >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Map input records=1036434 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Physical memory (bytes) >>>> snapshot=373760000 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Spilled Records=0 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: CPU time spent (ms)=24410 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Total committed heap usage >>>> (bytes)=168394752 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Virtual memory (bytes) >>>> snapshot=2124627968 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: Map output records=2462684 >>>> 13/01/16 14:41:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=248 >>>> >>>> >>>> >>>> When I kick off a job via a java client running on a different host, >>>> the job seems to run (I can see things being scanned and ingested) but I >>>> don't see anything via the jobtracker UI on the server. Is that normal? >>>> Or do I have something mis-configured? >>>> >>>> >>>> >>>> Here's how I'm starting things from the client: >>>> >>>> @Override >>>> public int run(String[] strings) throws Exception { >>>> Job job = new Job(getConf(), getClass().getSimpleName()); >>>> job.setJarByClass(getClass()); >>>> job.setMapperClass(MyMapper.class); >>>> >>>> job.setInputFormatClass(AccumuloRowInputFormat.class); >>>> >>>> AccumuloRowInputFormat.setZooKeeperInstance(job.getConfiguration(), >>>> instanceName, zookeepers); >>>> >>>> AccumuloRowInputFormat.setInputInfo(job.getConfiguration(), >>>> username, >>>> password.getBytes(), >>>> "...", >>>> new Authorizations()); >>>> >>>> job.setNumReduceTasks(0); >>>> >>>> job.setOutputFormatClass(AccumuloOutputFormat.class); >>>> job.setOutputKeyClass(Key.class); >>>> job.setOutputValueClass(Mutation.class); >>>> >>>> boolean createTables = true; >>>> String defaultTable = "..."; >>>> AccumuloOutputFormat.setOutputInfo(job.getConfiguration(), >>>> username, >>>> password.getBytes(), >>>> createTables, >>>> defaultTable); >>>> >>>> >>>> AccumuloOutputFormat.setZooKeeperInstance(job.getConfiguration(), >>>> instanceName, zookeepers); >>>> >>>> job.waitForCompletion(true); >>>> >>>> return job.isSuccessful() ? 0 : 1; >>>> } >>>> >>>> public static void main(String args[]) throws Exception { >>>> int res = ToolRunner.run(CachedConfiguration.getInstance(), new >>>> ...(), args); >>>> System.exit(res); >>>> } >>>> >>>> >>>> >>>> Here's the output when I run it via the client application: >>>> >>>> >>>> 2013-01-16 13:55:57,645 [main-SendThread()] INFO zookeeper.ClientCnxn >>>> - Opening socket connection to server accumulo/10.1.10.160:2181 >>>> 2013-01-16 13:55:57,660 [main-SendThread(accumulo:2181)] INFO >>>> zookeeper.ClientCnxn - Socket connection established to accumulo/ >>>> 10.1.10.160:2181, initiating session >>>> 2013-01-16 13:55:57,671 [main-SendThread(accumulo:2181)] INFO >>>> zookeeper.ClientCnxn - Session establishment complete on server accumulo/ >>>> 10.1.10.160:2181, sessionid = 0x13c449cfe010434, negotiated timeout = >>>> 30000 >>>> 2013-01-16 13:55:58,379 [main] INFO mapred.JobClient - Running job: >>>> job_local_0001 >>>> 2013-01-16 13:55:58,447 [Thread-16] INFO mapred.Task - Using >>>> ResourceCalculatorPlugin : null >>>> 2013-01-16 13:55:59,383 [main] INFO mapred.JobClient - map 0% reduce >>>> 0% >>>> 2013-01-16 13:56:04,458 [communication thread] INFO >>>> mapred.LocalJobRunner - >>>> 2013-01-16 13:56:07,459 [communication thread] INFO >>>> mapred.LocalJobRunner - >>>> 2013-01-16 13:56:10,461 [communication thread] INFO >>>> mapred.LocalJobRunner - >>>> 2013-01-16 13:56:13,462 [communication thread] INFO >>>> mapred.LocalJobRunner - >>>> 2013-01-16 13:56:16,463 [communication thread] INFO >>>> mapred.LocalJobRunner - >>>> 2013-01-16 13:56:19,465 [communication thread] INFO >>>> mapred.LocalJobRunner - >>>> 2013-01-16 13:56:21,783 [Thread-16] INFO mapred.Task - >>>> Task:attempt_local_0001_m_000000_0 is done. And is in the process of >>>> commiting >>>> 2013-01-16 13:56:21,783 [Thread-16] INFO mapred.LocalJobRunner - >>>> 2013-01-16 13:56:21,784 [Thread-16] INFO mapred.Task - Task >>>> 'attempt_local_0001_m_000000_0' done. >>>> 2013-01-16 13:56:21,786 [Thread-16] INFO mapred.Task - Using >>>> ResourceCalculatorPlugin : null >>>> 2013-01-16 13:56:22,423 [main] INFO mapred.JobClient - map 100% >>>> reduce 0% >>>> 2013-01-16 13:56:27,788 [communication thread] INFO >>>> mapred.LocalJobRunner - >>>> 2013-01-16 13:56:28,440 [main] INFO mapred.JobClient - map 50% >>>> reduce 0% >>>> 2013-01-16 13:56:30,790 [communication thread] INFO >>>> mapred.LocalJobRunner - >>>> 2013-01-16 13:56:33,791 [communication thread] INFO >>>> mapred.LocalJobRunner - >>>> 2013-01-16 13:56:36,792 [communication thread] INFO >>>> mapred.LocalJobRunner - >>>> 2013-01-16 13:56:39,793 [communication thread] INFO >>>> mapred.LocalJobRunner - >>>> 2013-01-16 13:56:42,794 [communication thread] INFO >>>> mapred.LocalJobRunner - >>>> 2013-01-16 13:56:45,779 [Thread-16] INFO mapred.Task - >>>> Task:attempt_local_0001_m_000001_0 is done. And is in the process of >>>> commiting >>>> 2013-01-16 13:56:45,780 [Thread-16] INFO mapred.LocalJobRunner - >>>> 2013-01-16 13:56:45,781 [Thread-16] INFO mapred.Task - Task >>>> 'attempt_local_0001_m_000001_0' done. >>>> 2013-01-16 13:56:45,782 [Thread-16] WARN mapred.FileOutputCommitter - >>>> Output path is null in cleanup >>>> 2013-01-16 13:56:46,462 [main] INFO mapred.JobClient - map 100% >>>> reduce 0% >>>> 2013-01-16 13:56:46,462 [main] INFO mapred.JobClient - Job complete: >>>> job_local_0001 >>>> 2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - Counters: 7 >>>> 2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - >>>> FileSystemCounters >>>> 2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - >>>> FILE_BYTES_READ=1257 >>>> 2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - >>>> FILE_BYTES_WRITTEN=106136 >>>> 2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - Map-Reduce >>>> Framework >>>> 2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - Map input >>>> records=1036434 >>>> 2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - Spilled >>>> Records=0 >>>> 2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - Total >>>> committed heap usage (bytes)=259915776 >>>> 2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - Map output >>>> records=2462684 >>>> 2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - >>>> SPLIT_RAW_BYTES=240 >>>> >>>> >>>> >>>> On Wed, Jan 16, 2013 at 11:20 AM, John Vines <[email protected]> wrote: >>>> >>>>> The code examples we have scripted simply do the necessary setup for >>>>> creating a mapreduce job and kicking it off. If you check out the code for >>>>> them in >>>>> src/examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/ >>>>> you can see what we're doing in Java to kick off jobs. >>>>> >>>>> The short explanation is, just like any other MapReduce job, we're >>>>> setting up a Job, configuring the AccumuloInput and/or OutputFormats, and >>>>> sending them off like any other MapReduce job. >>>>> >>>>> John >>>>> >>>>> >>>>> On Wed, Jan 16, 2013 at 12:11 PM, Mike Hugo <[email protected]> wrote: >>>>> >>>>>> I'm writing a client program that uses the BatchWriter and >>>>>> BatchScanner for inserting and querying data, but occasionally it also >>>>>> needs to be able to kick of a Map/Reduce job on a remote accumulo >>>>>> cluster. >>>>>> The Map/Reduce examples that ship with Accumulo look like they are meant >>>>>> to be invoked via the command line. Does anyone have an example of how >>>>>> to >>>>>> kick something off via a java client running on a separate server? Any >>>>>> best practices to share? >>>>>> Thanks, >>>>>> Mike >>>>>> >>>>> >>>>> >>>> >>> >> >
