Doğacan Güney wrote:

> My patch is just a draft to see if we can create a better caching
> mechanism. There are definitely some rough edges there:)

One important information: in future versions of Hadoop the method 
Configuration.setObject() is deprecated and then will be removed, so we 
have to grow our own caching mechanism anyway - either use a singleton 
cache, or change nearly all API-s to pass around a user/job/task context.

So, we will face this problem pretty soon, with the next upgrade of Hadoop.



> You are right about per-plugin parameters but I think it will be very
> difficult to keep PluginProperty class in sync with plugin parameters.
> I mean, if a plugin defines a new parameter, we have to remember to
> update PluginProperty. Perhaps, we can force plugins to define
> configuration options it will use in, say, its plugin.xml file, but
> that will be very error-prone too. I don't want to compare entire
> configuration objects, because changing irrevelant options, like
> fetcher.store.content shouldn't force loading plugins again, though it
> seems it may be inevitable....

Let me see if I understand this ... In my opinion this is a non-issue.

Child tasks are started in separate JVMs, so the only "context" 
information that they have is what they can read from job.xml (which is 
a superset of all properties from config files + job-specific data + 
task-specific data). This context is currently instantiated as a 
Configuration object, and we (ab)use it also as a local per-JVM cache 
for plugin instances and other objects.

Once we instantiate the plugins, they exist unchanged throughout the 
lifecycle of JVM (== lifecycle of a single task), so we don't have to 
worry about having different sets of plugins with different parameters 
for different jobs (or even tasks).

In other words, it seems to me that there is no such situation in which 
we have to reload plugins within the same JVM, but with different 
parameters.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to