Yes, this is the best way to go. Sent from my iPhone5s
> On 2014年3月22日, at 3:03, Something Something <mailinglist...@gmail.com> wrote: > > I will be happy to follow all these steps if someone confirms that this is > the best way to handle it. Seems harmless to me, but just wondering. Thanks. > > >> On Fri, Mar 21, 2014 at 1:26 AM, Bertrand Dechoux <decho...@gmail.com> wrote: >> JIRA, test, patch and review? I am sure the community would welcome it. And >> if you don't, well, it is unlikely to be appear soon into hadoop trunk. >> >> Bertrand >> >> >>> On Fri, Mar 21, 2014 at 12:49 AM, Something Something >>> <mailinglist...@gmail.com> wrote: >>> Confirmed that ToolRunner is NOT thread-safe: >>> >>> Original code (which runs into problems): >>> >>> public static int run(Configuration conf, Tool tool, String[] args) >>> throws Exception{ >>> if(conf == null) { >>> conf = new Configuration(); >>> } >>> GenericOptionsParser parser = new GenericOptionsParser(conf, args); >>> //set the configuration back, so that Tool can configure itself >>> tool.setConf(conf); >>> >>> //get the args w/o generic hadoop args >>> String[] toolArgs = parser.getRemainingArgs(); >>> return tool.run(toolArgs); >>> } >>> >>> >>> >>> >>> >>> New code (which works): >>> >>> public static int run(Configuration conf, Tool tool, String[] args) >>> throws Exception{ >>> if(conf == null) { >>> conf = new Configuration(); >>> } >>> GenericOptionsParser parser = getParser(conf, args); >>> >>> tool.setConf(conf); >>> >>> //get the args w/o generic hadoop args >>> String[] toolArgs = parser.getRemainingArgs(); >>> return tool.run(toolArgs); >>> } >>> >>> private static synchronized GenericOptionsParser >>> getParser(Configuration conf, String[] args) throws Exception { >>> return new GenericOptionsParser(conf, args); >>> } >>> >>> >>> >>> >>> >>> >>>> On Wed, Mar 19, 2014 at 10:15 AM, Something Something >>>> <mailinglist...@gmail.com> wrote: >>>> I would like to trigger a few Hadoop jobs simultaneously. I’ve created a >>>> pool of threads using Executors.newFixedThreadPool. Idea is that if the >>>> pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time >>>> using ‘ToolRunner.run’. In my testing, I noticed that these 2 threads >>>> keep stepping on each other. >>>> >>>> When I looked under the hood, I noticed that ToolRunner creates >>>> GenericOptionsParser which in turn calls a static method >>>> ‘buildGeneralOptions’. This method uses ‘OptionBuilder.withArgName’ which >>>> uses an instance variable called, ‘argName’. This doesn’t look thread >>>> safe to me and I believe is the root cause of issues I am running into. >>>> >>>> Any thoughts? >