Doğacan Güney wrote: > My patch is just a draft to see if we can create a better caching > mechanism. There are definitely some rough edges there:)
One important information: in future versions of Hadoop the method Configuration.setObject() is deprecated and then will be removed, so we have to grow our own caching mechanism anyway - either use a singleton cache, or change nearly all API-s to pass around a user/job/task context. So, we will face this problem pretty soon, with the next upgrade of Hadoop. > You are right about per-plugin parameters but I think it will be very > difficult to keep PluginProperty class in sync with plugin parameters. > I mean, if a plugin defines a new parameter, we have to remember to > update PluginProperty. Perhaps, we can force plugins to define > configuration options it will use in, say, its plugin.xml file, but > that will be very error-prone too. I don't want to compare entire > configuration objects, because changing irrevelant options, like > fetcher.store.content shouldn't force loading plugins again, though it > seems it may be inevitable.... Let me see if I understand this ... In my opinion this is a non-issue. Child tasks are started in separate JVMs, so the only "context" information that they have is what they can read from job.xml (which is a superset of all properties from config files + job-specific data + task-specific data). This context is currently instantiated as a Configuration object, and we (ab)use it also as a local per-JVM cache for plugin instances and other objects. Once we instantiate the plugins, they exist unchanged throughout the lifecycle of JVM (== lifecycle of a single task), so we don't have to worry about having different sets of plugins with different parameters for different jobs (or even tasks). In other words, it seems to me that there is no such situation in which we have to reload plugins within the same JVM, but with different parameters. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers