Hi, On Tue, Mar 3, 2009 at 15:27, Andrzej Bialecki <[email protected]> wrote: > Andrzej Bialecki wrote: >> >> Sami Siren wrote: >>> >>> Sami Siren wrote: >>>> >>>> Sami Siren wrote: >>>>> >>>>> I can see this error also. not sure yet what's going wrong... >>>> >>>> it's NUTCH-703 (hadoop upgrade) that broke the indexing. any ideas what >>>> changed in hadoop that might have caused this? >>> >>> found the hostile hadoop commit: >>> http://svn.apache.org/viewvc?view=rev&revision=736239 >>> >>> any ideas how to proceed? Naturally i won't be starting the releace >>> proces before this is resolved. >> >> I'll work on this now, we'll see if there's a solution ... In the worst >> case we could downgrade to 0.19.0, but there were some unpleasant bugs there >> - so I'll try to find a solution so that we can keep 0.19.1 . > > For now I tracked it down to the missing field options in LuceneWriter - > basically, the only field options it has is the following: > > fieldIndex: {segment=NO, digest=NO, boost=NO} > fieldStore: {segment=YES, boost=YES, digest=YES} > fieldVector: {segment=NO, digest=NO, boost=NO} > > So if the LuceneWriter processes e.g. "site", it comes back with Store.NO > and Index.NO, which indeed doesn't make sense. >
This is added in index-basic plugin in method addIndexBackendOptions. > Please note that LuceneWriter is properly initialized with the current > JobConf - so I'm not sure where these defaults should come from ...? The > code in LuceneWriter can take this information from > NutchDocument.metadata[lucene.*] properties, but they are not populated by > any indexing plugin ... > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > -- Doğacan Güney
