Hi, I don't know the howto you're referring to but I think it belongs to an older version of Nutch.
Let me try to explain... doc.add("key","value") - adds a new field to the document "doc" with the name "key" and the value "value". With that knowledge the indexer just knows there is another field to be added, but it doesn't know if it should be stored, tokenized, termvectored and so on. In order to tell the indexer how to index this field, you have to add a new line to the "addIndexBackendOptions(Configuration conf) method. This method is specified in every indexing filter. Example: public void addIndexBackendOptions(Configuration conf) { LuceneWriter.addFieldOptions("key", LuceneWriter.STORE.YES,LuceneWriter.INDEX.NO, conf); LuceneWriter.addFieldOptions("key2", LuceneWriter.STORE.NO,LuceneWriter.INDEX.TOKENIZED,LuceneWriter.VECTOR.POS, conf); } You need a parsing filter to extract data from the URLs you're crawling. I'm not aware of a DC metadata parser, so you need to write a parsing filter first, to extract the relevant data for you. Then you can index this data with the indexing filter you wrote. Hope this helps. Kind regards, Martina -----Ursprüngliche Nachricht----- Von: BELLINI ADAM [mailto:mbel...@msn.com] Gesendet: Dienstag, 22. September 2009 23:08 An: nutch-user@lucene.apache.org Betreff: RE: DC metadata any idea guys ! i'm just stuck here :( mbel...@msn.com From: mbel...@msn.com To: nutch-user@lucene.apache.org Subject: RE: DC metadata Date: Fri, 18 Sep 2009 14:12:35 +0000 hi again i just copied the directory of my new plugin 'which contains the jar file and the plugin.xml' to the nutch/plugins directory , and when i index now it gives me this error : 2009-09-18 10:03:44,754 WARN mapred.LocalJobRunner - job_local_0024 java.lang.IllegalArgumentException: it doesn't make sense to have a field that is neither indexed nor stored at org.apache.lucene.document.Field.<init>(Field.java:279) at org.apache.nutch.indexer.lucene.LuceneWriter.createLuceneDoc(LuceneWriter.java:133) at org.apache.nutch.indexer.lucene.LuceneWriter.write(LuceneWriter.java:239) at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:54) at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44) at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:158) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170) should i write a parser plugin too ?? thx From: mbel...@msn.com To: nutch-user@lucene.apache.org Subject: DC metadata Date: Thu, 17 Sep 2009 18:30:23 +0000 hi, i'm trying to add Dublingcode metadata to my index, i wrote the plugin as descriped at http://wiki.apache.org/nutch/CreateNewFilter and i build the project using ant... but when crawled my intranet i can't find the DoublingCode metadata in my index ?? did i missunderstand something ? thx Windows Live helps you keep up with all your friends, in one place. We are your photos. Share us now with Windows Live Photos. _________________________________________________________________ Create a cool, new character for your Windows LiveT Messenger. http://go.microsoft.com/?linkid=9656621