Confirmed that ToolRunner is NOT thread-safe:

*Original code (which runs into problems):*

  public static int run(Configuration conf, Tool tool, String[] args)
    throws Exception{
    if(conf == null) {
      conf = new Configuration();
    }
    GenericOptionsParser parser = new GenericOptionsParser(conf, args);
    //set the configuration back, so that Tool can configure itself
    tool.setConf(conf);

    //get the args w/o generic hadoop args
    String[] toolArgs = parser.getRemainingArgs();
    return tool.run(toolArgs);
  }





*New code (which works):*

    public static int run(Configuration conf, Tool tool, String[] args)
            throws Exception{
        if(conf == null) {
            conf = new Configuration();
        }
        GenericOptionsParser parser = getParser(conf, args);

        tool.setConf(conf);

        //get the args w/o generic hadoop args
        String[] toolArgs = parser.getRemainingArgs();
        return tool.run(toolArgs);
    }

    private static *synchronized *GenericOptionsParser
getParser(Configuration conf, String[] args) throws Exception {
        return new GenericOptionsParser(conf, args);
    }






On Wed, Mar 19, 2014 at 10:15 AM, Something Something <
mailinglist...@gmail.com> wrote:

> I would like to trigger a few Hadoop jobs simultaneously.  I've created a
> pool of threads using Executors.newFixedThreadPool.  Idea is that if the
> pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time
> using 'ToolRunner.run'.  In my testing, I noticed that these 2 threads
> keep stepping on each other.
>
> When I looked under the hood, I noticed that ToolRunner creates
> GenericOptionsParser which in turn calls a static method
> 'buildGeneralOptions'.  This method uses 'OptionBuilder.withArgName'
> which uses an instance variable called, 'argName'.  This doesn't look
> thread safe to me and I believe is the root cause of issues I am running
> into.
>
> Any thoughts?
>

Reply via email to