On 5/30/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Doğacan Güney wrote: > > > My patch is just a draft to see if we can create a better caching > > mechanism. There are definitely some rough edges there:) > > One important information: in future versions of Hadoop the method > Configuration.setObject() is deprecated and then will be removed, so we > have to grow our own caching mechanism anyway - either use a singleton > cache, or change nearly all API-s to pass around a user/job/task context. > > So, we will face this problem pretty soon, with the next upgrade of Hadoop.
Hmm, well, that sucks, but this is not really a problem for PluginRepository: PluginRepository already has its own cache mechanism. > > > > > You are right about per-plugin parameters but I think it will be very > > difficult to keep PluginProperty class in sync with plugin parameters. > > I mean, if a plugin defines a new parameter, we have to remember to > > update PluginProperty. Perhaps, we can force plugins to define > > configuration options it will use in, say, its plugin.xml file, but > > that will be very error-prone too. I don't want to compare entire > > configuration objects, because changing irrevelant options, like > > fetcher.store.content shouldn't force loading plugins again, though it > > seems it may be inevitable.... > > Let me see if I understand this ... In my opinion this is a non-issue. > > Child tasks are started in separate JVMs, so the only "context" > information that they have is what they can read from job.xml (which is > a superset of all properties from config files + job-specific data + > task-specific data). This context is currently instantiated as a > Configuration object, and we (ab)use it also as a local per-JVM cache > for plugin instances and other objects. > > Once we instantiate the plugins, they exist unchanged throughout the > lifecycle of JVM (== lifecycle of a single task), so we don't have to > worry about having different sets of plugins with different parameters > for different jobs (or even tasks). > > In other words, it seems to me that there is no such situation in which > we have to reload plugins within the same JVM, but with different > parameters. Problem is that someone might get a little too smart. Like one may write a new job where he has two IndexingFilters but creates each from completely different configuration objects. Then filters some documents with the first filter and others with the second. I agree that this is a bit of a reach, but it is possible. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers