Re: Including external libraries in my job.

Friso van Vollenhoven Tue, 03 May 2011 09:03:25 -0700

Hi,

The generic way of getting additional jars on the job's classpath that I 
typically use is to make your job jar contain a /lib folder in which you place 
all dependencies (without unpacking, just the .jar files). You can include the 
HBase jars there as well.


If you use Maven to build your jar, the assembly plugin can do this for you.


Cheers,
Friso


On 3 mei 2011, at 16:29, Niels Basjes wrote:

> Hi Harsh,
> 
> 2011/5/3 Harsh J <[email protected]>:
>> Am moving this to hbase-user, since its more relevant to HBase here
>> than MR's typical job submissions.
> 
> I figured this is a generic problem in getting additional libraries
> pushed along towards the task trackers. That is why I posted in to the
> mr-user list.
> 
>> My reply below:
> 
>> On Tue, May 3, 2011 at 7:12 PM, Niels Basjes <[email protected]> wrote:
>>> I've written my first very simple job that does something with hbase.
>>> 
>>> Now when I try to submit my jar in my cluster I get this:
>>> 
>>> [nbasjes@master ~/src/catalogloader/run]$ hadoop jar
>>> catalogloader-1.0-SNAPSHOT.jar nl.basjes.catalogloader.Loader
>>> /user/nbasjes/Minicatalog.xml
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/hadoop/hbase/HBaseConfiguration
> ...
> 
>> The best way to write a Job Driver for HBase would be to use its
>> TableMapReduceUtil class to make it add dependent jars, prepare jobs
>> with a Scan, etc. [1].
>> 
>> Once your driver reflects the use of TableMapReduceUtil, simply do
>> (assuming HBase's bin/ is on PATH as well):
>> $ HADOOP_CLASSPATH=`hbase classpath` hadoop jar
>> nl.basjes.catalogloader.Loader /user/nbasjes/Minicatalog.xml
> 
> Sounds good, but it also sounds like HBase has a utility to work
> around an omission in the base Hadoop MR platform.
> I'll give it a try.
> 
>> If you would still like to use -libjars to add in aux jars, make your
>> Driver use the GenericOptionsParser class [2]. Something like:
>> 
>> main(args) {
>> parser = new GenericOptionsParser(args);
>> conf = parser.getConfiguration();
>> rem_args = parser.getRemainingArgs();
>> // Do extra args processing if any..
>> // use 'conf' for your Job, not a new instance.
>> }
> 
> As far as I understood implementing "Tool" is the way to go with
> hadoop 0.20 and newer.
> So my current boilerplate looks like this (snipped useless parts):
> 
> ===============
> public class Loader extends Configured implements Tool {
> ... SNIP: my ImportMapper class ...
> 
>    @Override
>    public int run(String[] args) throws Exception {
>        Configuration config = getConf();
>        config.set(TableOutputFormat.OUTPUT_TABLE, "products");
>        Job job = new Job(config, "Import product catalog");
>        job.setJarByClass(this.getClass());
> 
>        String input = args[0];
> 
>        TextInputFormat.setInputPaths(job, new Path(input));
>        job.setInputFormatClass(TextInputFormat.class);
>        job.setMapperClass(ImportMapper.class);
>        job.setNumReduceTasks(0);
> 
>        job.setOutputFormatClass(TableOutputFormat.class);
> 
>        job.waitForCompletion(true);
> 
>        return 0;
>    }
> 
>    public static void main(String[] args) throws Exception {
>        Configuration config = HBaseConfiguration.create();
>        int result = ToolRunner.run(config, new Loader(), args);
>        System.exit(result);
>    }
> }
> ===============
> 
> Where did I go wrong?
> 
> -- 
> Met vriendelijke groeten,
> 
> Niels Basjes

Re: Including external libraries in my job.

Reply via email to