hi, thank you for your answer... i was talking about this howto :
CreateNewFilter Howto add a category metadata to your index and be able to search for it. For this, you need to write an indexing filter and a query filter. Indexing your custom metadata For the indexing filter, copy the index-more plugin, and change names, dirs, and build files appropriately. The main thing to change is the filter method: public Document filter(Document doc, Parse parse, FetcherOutput fo)In it, you can add your own fields. To add a new category with value "puppies", it will look something like this: doc.add(new Field("category", "puppies", false, true, false));See the Document.add API for more info on the booleans. That's pretty much it for indexing. Searching your metadata To search for this, you need to create a query filter. Copy the query-site plugin. Again change file names, directories, and build files as needed. The main java file is very simple, just change the string in the line with "super". Instead of: super("site");You would have super("category");Make sure that you put your new index-category and query-category plugins in your nutch-default.xml file. Don't forget to check that it's in your WEB-INF/classess directory too. so as you said i have to wrote a parser too, but some people had trouble with this howto http://wiki.apache.org/nutch/WritingPluginExample-0.9it seems it doesnt work for nutch 1.0. do you have some idea about what i have to change to this example to make it works for nutch 1.0 ?? thx a lot > From: k...@huberverlag.de > To: nutch-user@lucene.apache.org > Date: Wed, 23 Sep 2009 08:41:55 +0200 > Subject: AW: DC metadata > > Hi, > > I don't know the howto you're referring to but I think it belongs to an older > version of Nutch. > > Let me try to explain... > > doc.add("key","value") - adds a new field to the document "doc" with the > name "key" and the value "value". With that knowledge the indexer just knows > there is another field to be added, but it doesn't know if it should be > stored, tokenized, termvectored and so on. > In order to tell the indexer how to index this field, you have to add a new > line to the "addIndexBackendOptions(Configuration conf) method. This method > is specified in every indexing filter. > > Example: > public void addIndexBackendOptions(Configuration conf) { > LuceneWriter.addFieldOptions("key", > LuceneWriter.STORE.YES,LuceneWriter.INDEX.NO, conf); > LuceneWriter.addFieldOptions("key2", > LuceneWriter.STORE.NO,LuceneWriter.INDEX.TOKENIZED,LuceneWriter.VECTOR.POS, > conf); > } > > You need a parsing filter to extract data from the URLs you're crawling. I'm > not aware of a DC metadata parser, so you need to write a parsing filter > first, to extract the relevant data for you. Then you can index this data > with the indexing filter you wrote. > > Hope this helps. > Kind regards, > Martina > > > > > -----Ursprüngliche Nachricht----- > Von: BELLINI ADAM [mailto:mbel...@msn.com] > Gesendet: Dienstag, 22. September 2009 23:08 > An: nutch-user@lucene.apache.org > Betreff: RE: DC metadata > > > any idea guys ! i'm just stuck here :( > > mbel...@msn.com > > > > > From: mbel...@msn.com > To: nutch-user@lucene.apache.org > Subject: RE: DC metadata > Date: Fri, 18 Sep 2009 14:12:35 +0000 > > > > > > > > > hi again > > i just copied the directory of my new plugin 'which contains the jar file and > the plugin.xml' to the nutch/plugins directory , and when i index now it > gives me this error : > > 2009-09-18 10:03:44,754 WARN mapred.LocalJobRunner - job_local_0024 > java.lang.IllegalArgumentException: it doesn't make sense to have a field > that is neither indexed nor stored > at org.apache.lucene.document.Field.<init>(Field.java:279) > at > org.apache.nutch.indexer.lucene.LuceneWriter.createLuceneDoc(LuceneWriter.java:133) > at > org.apache.nutch.indexer.lucene.LuceneWriter.write(LuceneWriter.java:239) > at > org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:54) > at > org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44) > at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410) > at > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:158) > at > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170) > > > should i write a parser plugin too ?? > > thx > > > > From: mbel...@msn.com > To: nutch-user@lucene.apache.org > Subject: DC metadata > Date: Thu, 17 Sep 2009 18:30:23 +0000 > > > > > > > > > hi, > i'm trying to add Dublingcode metadata to my index, i wrote the plugin as > descriped at http://wiki.apache.org/nutch/CreateNewFilter > > and i build the project using ant... > but when crawled my intranet i can't find the DoublingCode metadata in my > index ?? > did i missunderstand something ? > > thx > > Windows Live helps you keep up with all your friends, in one place. > > We are your photos. Share us now with Windows Live Photos. > > _________________________________________________________________ > Create a cool, new character for your Windows LiveT Messenger. > http://go.microsoft.com/?linkid=9656621 _________________________________________________________________ Attention all humans. We are your photos. Free us. http://go.microsoft.com/?linkid=9666046