Re: [Nutch-dev] Plugins initialized all the time!

Doğacan Güney Wed, 30 May 2007 04:48:45 -0700

On 5/30/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Doğacan Güney wrote:
>
> > My patch is just a draft to see if we can create a better caching
> > mechanism. There are definitely some rough edges there:)
>
> One important information: in future versions of Hadoop the method
> Configuration.setObject() is deprecated and then will be removed, so we
> have to grow our own caching mechanism anyway - either use a singleton
> cache, or change nearly all API-s to pass around a user/job/task context.
>
> So, we will face this problem pretty soon, with the next upgrade of Hadoop.


Hmm, well, that sucks, but this is not really a problem for
PluginRepository: PluginRepository already has its own cache
mechanism.

>
>
>
> > You are right about per-plugin parameters but I think it will be very
> > difficult to keep PluginProperty class in sync with plugin parameters.
> > I mean, if a plugin defines a new parameter, we have to remember to
> > update PluginProperty. Perhaps, we can force plugins to define
> > configuration options it will use in, say, its plugin.xml file, but
> > that will be very error-prone too. I don't want to compare entire
> > configuration objects, because changing irrevelant options, like
> > fetcher.store.content shouldn't force loading plugins again, though it
> > seems it may be inevitable....
>
> Let me see if I understand this ... In my opinion this is a non-issue.
>
> Child tasks are started in separate JVMs, so the only "context"
> information that they have is what they can read from job.xml (which is
> a superset of all properties from config files + job-specific data +
> task-specific data). This context is currently instantiated as a
> Configuration object, and we (ab)use it also as a local per-JVM cache
> for plugin instances and other objects.
>
> Once we instantiate the plugins, they exist unchanged throughout the
> lifecycle of JVM (== lifecycle of a single task), so we don't have to
> worry about having different sets of plugins with different parameters
> for different jobs (or even tasks).
>
> In other words, it seems to me that there is no such situation in which
> we have to reload plugins within the same JVM, but with different
> parameters.

Problem is that someone might get a little too smart. Like one may
write a new job where he has two IndexingFilters but creates each from
completely different configuration objects. Then filters some
documents with the first filter and others with the second. I agree
that this is a bit of a reach, but it is possible.


>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] Plugins initialized all the time!

Reply via email to