Doğacan Güney wrote:
Hi,
On Tue, Mar 3, 2009 at 15:27, Andrzej Bialecki <[email protected]> wrote:
Andrzej Bialecki wrote:
Sami Siren wrote:
Sami Siren wrote:
Sami Siren wrote:
I can see this error also. not sure yet what's going wrong...
it's NUTCH-703 (hadoop upgrade) that broke the indexing. any ideas what
changed in hadoop that might have caused this?
found the hostile hadoop commit:
http://svn.apache.org/viewvc?view=rev&revision=736239
any ideas how to proceed? Naturally i won't be starting the releace
proces before this is resolved.
I'll work on this now, we'll see if there's a solution ... In the worst
case we could downgrade to 0.19.0, but there were some unpleasant bugs there
- so I'll try to find a solution so that we can keep 0.19.1 .
For now I tracked it down to the missing field options in LuceneWriter -
basically, the only field options it has is the following:
fieldIndex: {segment=NO, digest=NO, boost=NO}
fieldStore: {segment=YES, boost=YES, digest=YES}
fieldVector: {segment=NO, digest=NO, boost=NO}
So if the LuceneWriter processes e.g. "site", it comes back with Store.NO
and Index.NO, which indeed doesn't make sense.
This is added in index-basic plugin in method addIndexBackendOptions.
Hm, you are right - this is happening properly in configure(), and I can
see in the debugger that IndexingFilters are initialized and they
populate the field options... However, apparently the
IndexerOutputFormat initialization takes place in ReduceTask _before_
the Reduce.configure() is called, so that the current JobConf that we
use to initialize IndexerOutputFormat is not yet populated ...
A crude hack would be to run IndexingFilters in
IndexerOutputFormat.getRecordWriter(..)
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com