Cool, thanks for the feedback John, the examples have been helpful in
getting up and running!
Perhaps I'm not doing something quite right. When I jar up my jobs and
deploy the jar to the server and run it via the tool.sh command on the
cluster, I see the job running in the jobtracker (servername:50030) and it
runs as I would expect.
13/01/16 14:39:53 INFO mapred.JobClient: Running job: job_201301161326_0006
13/01/16 14:39:54 INFO mapred.JobClient: map 0% reduce 0%
13/01/16 14:41:29 INFO mapred.JobClient: map 50% reduce 0%
13/01/16 14:41:35 INFO mapred.JobClient: map 100% reduce 0%
13/01/16 14:41:40 INFO mapred.JobClient: Job complete: job_201301161326_0006
13/01/16 14:41:40 INFO mapred.JobClient: Counters: 18
13/01/16 14:41:40 INFO mapred.JobClient: Job Counters
13/01/16 14:41:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=180309
13/01/16 14:41:40 INFO mapred.JobClient: Total time spent by all
reduces waiting after reserving slots (ms)=0
13/01/16 14:41:40 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
13/01/16 14:41:40 INFO mapred.JobClient: Rack-local map tasks=2
13/01/16 14:41:40 INFO mapred.JobClient: Launched map tasks=2
13/01/16 14:41:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/01/16 14:41:40 INFO mapred.JobClient: File Output Format Counters
13/01/16 14:41:40 INFO mapred.JobClient: Bytes Written=0
13/01/16 14:41:40 INFO mapred.JobClient: FileSystemCounters
13/01/16 14:41:40 INFO mapred.JobClient: HDFS_BYTES_READ=248
13/01/16 14:41:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=60214
13/01/16 14:41:40 INFO mapred.JobClient: File Input Format Counters
13/01/16 14:41:40 INFO mapred.JobClient: Bytes Read=0
13/01/16 14:41:40 INFO mapred.JobClient: Map-Reduce Framework
13/01/16 14:41:40 INFO mapred.JobClient: Map input records=1036434
13/01/16 14:41:40 INFO mapred.JobClient: Physical memory (bytes)
snapshot=373760000
13/01/16 14:41:40 INFO mapred.JobClient: Spilled Records=0
13/01/16 14:41:40 INFO mapred.JobClient: CPU time spent (ms)=24410
13/01/16 14:41:40 INFO mapred.JobClient: Total committed heap usage
(bytes)=168394752
13/01/16 14:41:40 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=2124627968
13/01/16 14:41:40 INFO mapred.JobClient: Map output records=2462684
13/01/16 14:41:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=248
When I kick off a job via a java client running on a different host, the
job seems to run (I can see things being scanned and ingested) but I don't
see anything via the jobtracker UI on the server. Is that normal? Or do I
have something mis-configured?
Here's how I'm starting things from the client:
@Override
public int run(String[] strings) throws Exception {
Job job = new Job(getConf(), getClass().getSimpleName());
job.setJarByClass(getClass());
job.setMapperClass(MyMapper.class);
job.setInputFormatClass(AccumuloRowInputFormat.class);
AccumuloRowInputFormat.setZooKeeperInstance(job.getConfiguration(),
instanceName, zookeepers);
AccumuloRowInputFormat.setInputInfo(job.getConfiguration(),
username,
password.getBytes(),
"...",
new Authorizations());
job.setNumReduceTasks(0);
job.setOutputFormatClass(AccumuloOutputFormat.class);
job.setOutputKeyClass(Key.class);
job.setOutputValueClass(Mutation.class);
boolean createTables = true;
String defaultTable = "...";
AccumuloOutputFormat.setOutputInfo(job.getConfiguration(),
username,
password.getBytes(),
createTables,
defaultTable);
AccumuloOutputFormat.setZooKeeperInstance(job.getConfiguration(),
instanceName, zookeepers);
job.waitForCompletion(true);
return job.isSuccessful() ? 0 : 1;
}
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(CachedConfiguration.getInstance(), new
...(), args);
System.exit(res);
}
Here's the output when I run it via the client application:
2013-01-16 13:55:57,645 [main-SendThread()] INFO zookeeper.ClientCnxn -
Opening socket connection to server accumulo/10.1.10.160:2181
2013-01-16 13:55:57,660 [main-SendThread(accumulo:2181)] INFO
zookeeper.ClientCnxn - Socket connection established to accumulo/
10.1.10.160:2181, initiating session
2013-01-16 13:55:57,671 [main-SendThread(accumulo:2181)] INFO
zookeeper.ClientCnxn - Session establishment complete on server accumulo/
10.1.10.160:2181, sessionid = 0x13c449cfe010434, negotiated timeout = 30000
2013-01-16 13:55:58,379 [main] INFO mapred.JobClient - Running job:
job_local_0001
2013-01-16 13:55:58,447 [Thread-16] INFO mapred.Task - Using
ResourceCalculatorPlugin : null
2013-01-16 13:55:59,383 [main] INFO mapred.JobClient - map 0% reduce 0%
2013-01-16 13:56:04,458 [communication thread] INFO mapred.LocalJobRunner
-
2013-01-16 13:56:07,459 [communication thread] INFO mapred.LocalJobRunner
-
2013-01-16 13:56:10,461 [communication thread] INFO mapred.LocalJobRunner
-
2013-01-16 13:56:13,462 [communication thread] INFO mapred.LocalJobRunner
-
2013-01-16 13:56:16,463 [communication thread] INFO mapred.LocalJobRunner
-
2013-01-16 13:56:19,465 [communication thread] INFO mapred.LocalJobRunner
-
2013-01-16 13:56:21,783 [Thread-16] INFO mapred.Task -
Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting
2013-01-16 13:56:21,783 [Thread-16] INFO mapred.LocalJobRunner -
2013-01-16 13:56:21,784 [Thread-16] INFO mapred.Task - Task
'attempt_local_0001_m_000000_0' done.
2013-01-16 13:56:21,786 [Thread-16] INFO mapred.Task - Using
ResourceCalculatorPlugin : null
2013-01-16 13:56:22,423 [main] INFO mapred.JobClient - map 100% reduce 0%
2013-01-16 13:56:27,788 [communication thread] INFO mapred.LocalJobRunner
-
2013-01-16 13:56:28,440 [main] INFO mapred.JobClient - map 50% reduce 0%
2013-01-16 13:56:30,790 [communication thread] INFO mapred.LocalJobRunner
-
2013-01-16 13:56:33,791 [communication thread] INFO mapred.LocalJobRunner
-
2013-01-16 13:56:36,792 [communication thread] INFO mapred.LocalJobRunner
-
2013-01-16 13:56:39,793 [communication thread] INFO mapred.LocalJobRunner
-
2013-01-16 13:56:42,794 [communication thread] INFO mapred.LocalJobRunner
-
2013-01-16 13:56:45,779 [Thread-16] INFO mapred.Task -
Task:attempt_local_0001_m_000001_0 is done. And is in the process of
commiting
2013-01-16 13:56:45,780 [Thread-16] INFO mapred.LocalJobRunner -
2013-01-16 13:56:45,781 [Thread-16] INFO mapred.Task - Task
'attempt_local_0001_m_000001_0' done.
2013-01-16 13:56:45,782 [Thread-16] WARN mapred.FileOutputCommitter -
Output path is null in cleanup
2013-01-16 13:56:46,462 [main] INFO mapred.JobClient - map 100% reduce 0%
2013-01-16 13:56:46,462 [main] INFO mapred.JobClient - Job complete:
job_local_0001
2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - Counters: 7
2013-01-16 13:56:46,463 [main] INFO mapred.JobClient -
FileSystemCounters
2013-01-16 13:56:46,463 [main] INFO mapred.JobClient -
FILE_BYTES_READ=1257
2013-01-16 13:56:46,463 [main] INFO mapred.JobClient -
FILE_BYTES_WRITTEN=106136
2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - Map-Reduce
Framework
2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - Map input
records=1036434
2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - Spilled
Records=0
2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - Total
committed heap usage (bytes)=259915776
2013-01-16 13:56:46,463 [main] INFO mapred.JobClient - Map output
records=2462684
2013-01-16 13:56:46,463 [main] INFO mapred.JobClient -
SPLIT_RAW_BYTES=240
On Wed, Jan 16, 2013 at 11:20 AM, John Vines <[email protected]> wrote:
> The code examples we have scripted simply do the necessary setup for
> creating a mapreduce job and kicking it off. If you check out the code for
> them in
> src/examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/
> you can see what we're doing in Java to kick off jobs.
>
> The short explanation is, just like any other MapReduce job, we're setting
> up a Job, configuring the AccumuloInput and/or OutputFormats, and sending
> them off like any other MapReduce job.
>
> John
>
>
> On Wed, Jan 16, 2013 at 12:11 PM, Mike Hugo <[email protected]> wrote:
>
>> I'm writing a client program that uses the BatchWriter and BatchScanner
>> for inserting and querying data, but occasionally it also needs to be able
>> to kick of a Map/Reduce job on a remote accumulo cluster. The Map/Reduce
>> examples that ship with Accumulo look like they are meant to be invoked via
>> the command line. Does anyone have an example of how to kick something off
>> via a java client running on a separate server? Any best practices to
>> share?
>> Thanks,
>> Mike
>>
>
>