Ah that makes a lot of sense! I will go ahead and open a Jira issue. Thanks for the reply!
Alex On Wed, Jun 12, 2013 at 3:50 PM, Sebastian Nagel <[email protected] > wrote: > Hi, > > > I'm writing a custom IndexWriter and I had some questions on the > execution > > workflow. > Have a look at NUTCH-1527 and NUTCH-1541. > > > > > I notice that when I run my index writer plugin the following happens: > > > > - the describe String is printed > > - the .open method is called once > > - the .write method is called for every NutchDocument > > - the .close method is called > > - the .open method is called > with argument "name" = "commit" > > - the .commit method is called > > - the .close method is called again > > > > This in most cases seems fine, however I'm not totally clear on what the > > .update or the .delete methods would be used. What is the "expected" use > > for these? > Intuitively, update resp. delete documents which are already in the index > Delete is used, e.g., to be sure that 404 documents are definitely removed > from a Solr index. > Update is actually not used. It may be useful for index end-points which > support field-level updates to update only some fields (e.g. score/boost > and anchor texts which depend on many documents and are permanently > changing). > > But you are definitively right. The interface o.a.n.indexer.IndexWriter > should provide good documentation for all required methods. Feel free > to open a jira. > > > As a possibly related question, is it possible to change the workflow of > > the plugin (without editing Nutch source beyond the plugin)? > > Hardly. You have some control what is done by the command-line options > -noCommit > and -deleteGone. See o.a.n.indexer.IndexingJob.run(), also shown by > % bin/nutch index > > Bye, > Sebastian >

