Ah that makes a lot of sense! I will go ahead and open a Jira issue. Thanks
for the reply!

Alex


On Wed, Jun 12, 2013 at 3:50 PM, Sebastian Nagel <[email protected]
> wrote:

> Hi,
>
> > I'm writing a custom IndexWriter and I had some questions on the
> execution
> > workflow.
> Have a look at NUTCH-1527 and NUTCH-1541.
>
> >
> > I notice that when I run my index writer plugin the following happens:
> >
> > - the describe String is printed
> > - the .open method is called once
> > - the .write method is called for every NutchDocument
> > - the .close method is called
> > - the .open method is called
> with argument "name" = "commit"
> > - the .commit method is called
> > - the .close method is called again
> >
> > This in most cases seems fine, however I'm not totally clear on what the
> > .update or the .delete methods would be used. What is the "expected" use
> > for these?
> Intuitively, update resp. delete documents which are already in the index
> Delete is used, e.g., to be sure that 404 documents are definitely removed
> from a Solr index.
> Update is actually not used. It may be useful for index end-points which
> support field-level updates to update only some fields (e.g. score/boost
> and anchor texts which depend on many documents and are permanently
> changing).
>
> But you are definitively right. The interface o.a.n.indexer.IndexWriter
> should provide good documentation for all required methods. Feel free
> to open a jira.
>
> > As a possibly related question, is it possible to change the workflow of
> > the plugin (without editing Nutch source beyond the plugin)?
>
> Hardly. You have some control what is done by the command-line options
> -noCommit
> and -deleteGone. See o.a.n.indexer.IndexingJob.run(), also shown by
>  % bin/nutch index
>
> Bye,
> Sebastian
>

Reply via email to