On 6/8/07, Enzo Michelangeli <[EMAIL PROTECTED]> wrote:
----- Original Message -----
From: "Doğacan Güney" <[EMAIL PROTECTED]>
Sent: Friday, June 08, 2007 3:49 PM

[...]
>> Any idea?
>
> This will certainly help a lot. If it is not too much trouble, can you
> add debug outputs for hashCodes of conf objects (both for the one in
> the cache and for the parameter, because it seems Configuration object
> is created more than once so their hashCode may be different, which in
> turn causes the change in CACHE's hashCode(*)) and a stack trace?
> A stack trace of depth 2-3 will probably suffice, I am just wondering
> what is calling PluginRepository.get(conf).

OK, I changed my debug code as follows:

  public static synchronized PluginRepository get(Configuration conf) {
    PluginRepository result = CACHE.get(conf);
        /* --- start debug code */
        String tr = "";
        StackTraceElement[] tes = Thread.currentThread().getStackTrace();
        for(int j=2; j<tes.length; j++)
            tr = tr+"\n    "+tes[j].toString();
        LOG.info("In thread "+Thread.currentThread()+
                 " a static method of the class "+
                 (new CurrentClassGetter()).getCurrentClass()+
                 " called CACHE.get("+conf+
                 "), where CACHE is "+CACHE+
                 " and CACHE.hashCode() = "+CACHE.hashCode()+
                 " - got result = "+result+
                 " conf.hashCode() was: "+conf.hashCode()+
                 " hashCode was: "+conf.hashCode()+
                 " Stack Trace:"+tr);
        /* end debug code --- */
    if (result == null) {
      result = new PluginRepository(conf);
      CACHE.put(conf, result);
    }
    return result;
  }

  /* --- start debug code */
  public static class CurrentClassGetter extends SecurityManager {
    public String getCurrentClass() {
      Class cl = super.getClassContext()[1];
      return cl.toString() + "@" + cl.hashCode();
    }
  }
  /* end debug code --- */

(With full stack trace: bytes are cheap ;-) )

I did not bother to print the hashCode of the keys in CACHE because it's
become evident why CACHE.get(conf) returns null: the hashCode of conf
changes!

That's true. Take a look at this code from LocalJobRunner.Job.run:
for (int i = 0; i < splits.length; i++) {
         String mapId = "map_" + newId() ;
         mapIds.add(mapId);
         MapTask map = new MapTask(jobId, file, "tip_m_" + mapId,
                                   mapId, i,
                                   splits[i]);
         JobConf localConf = new JobConf(job);
         map.localizeConfiguration(localConf);
         map.setConf(localConf);
         map_tasks += 1;
         myMetrics.launchMap();
         map.run(localConf, this);
         myMetrics.completeMap();
         map_tasks -= 1;
       }

For each new map task, hadoop creates a new configuration object
(which of course makes sense), so its hashCode changes and all hell
breaks loose.

If I understood the code correctly, this will not be a problem in
distributed environment. Each new map task gets its own process
anyway, so there should be no leak.

This is strange, because, as you can see below, the strings that
make keys and values of conf appears unchanged. Perhaps we should override
the equals() method in org.apache.hadoop.conf.Configuration (invoked by
CACHE.get(), according to the specs of the java.util.Map interface), so that
the hashCode()s of the keys get ignored, and conf1.equals(conf2) return true
if and only if:

 1. conf1.size() == conf2.size(),

 2. for each key k1 of conf1 there is a key k2 in conf2 such as:
  2.1 k1.equals(k2)
  2.2 conf1.get(k1).equals(conf2.get(k2))

This has been suggested before and I have to say I don't like this
one, because this means that each call to PluginRepository.get(conf)
will end up comparing all key value pairs, which, IMO, is
excessive(because if I am not mistaken, we don't need this when
running nutch in a distributed environment.). Unfortunately, this may
be the only way to fix this leak.

This is probably not a good idea, but here it goes: Perhaps, we can
change LocaJobRunner.Job.run method. First, it clones the JobConf
object (clonedConf).  It then runs a MapTask with the original
JobConf. Upon completion, it copies everything from clonedConf back to
original JobConf. This way, original JobConf's hashCode won't change,
so there should be no leak.



Anyway, I'm attaching the log below.

> Thanks for the detailed analysis!

Glad to be of help!

Enzo


--
Doğacan Güney

Reply via email to