RE: AW: DC metadata

BELLINI ADAM Wed, 23 Sep 2009 06:46:00 -0700

hi, thank you for your answer...

i was talking about this howto :


CreateNewFilter
Howto
add a category metadata to your index and be able to search for it. For
this, you need to write an indexing filter and a query filter. 
Indexing your custom metadata
For the
indexing filter, copy the index-more plugin, and change names, dirs,
and build files appropriately. The main thing to change is the filter
method:      public Document filter(Document doc, Parse parse, FetcherOutput 
fo)In it, you can add your own fields. To add a new category with value 
"puppies", it will look something like this:      doc.add(new Field("category", 
"puppies", false, true, false));See the Document.add API for more info on the 
booleans. That's pretty much it for indexing.  
Searching your metadata
To search
for this, you need to create a query filter. Copy the query-site
plugin. Again change file names, directories, and build files as
needed. The main java file is very simple, just change the string in
the line with "super". Instead of:    super("site");You would have   
super("category");Make
sure that you put your new index-category and query-category plugins in
your nutch-default.xml file. Don't forget to check that it's in your
WEB-INF/classess directory too. 



so as you said i have to wrote a parser too, but some people had trouble with 
this howto http://wiki.apache.org/nutch/WritingPluginExample-0.9it seems it 
doesnt work for nutch 1.0. do you have some idea about what i have to change to 
this example to make it works for nutch 1.0 ??

thx a lot



> From: k...@huberverlag.de
> To: nutch-user@lucene.apache.org
> Date: Wed, 23 Sep 2009 08:41:55 +0200
> Subject: AW: DC metadata
> 
> Hi,
> 
> I don't know the howto you're referring to but I think it belongs to an older 
> version of Nutch.
> 
> Let me try to explain...
> 
> doc.add("key","value")  -  adds a new field to the document "doc" with the 
> name "key" and the value "value". With that knowledge the indexer just knows 
> there is another field to be added, but it doesn't know if it should be 
> stored, tokenized, termvectored and so on.
> In order to tell the indexer how to index this field, you have to add a new 
> line to the "addIndexBackendOptions(Configuration conf) method. This method 
> is specified in every indexing filter.
> 
> Example:
> public void addIndexBackendOptions(Configuration conf) {
>       LuceneWriter.addFieldOptions("key", 
> LuceneWriter.STORE.YES,LuceneWriter.INDEX.NO, conf);
>       LuceneWriter.addFieldOptions("key2", 
> LuceneWriter.STORE.NO,LuceneWriter.INDEX.TOKENIZED,LuceneWriter.VECTOR.POS, 
> conf);
> }
> 
> You need a parsing filter to extract data from the URLs you're crawling. I'm 
> not aware of a DC metadata parser, so you need to write a parsing filter 
> first, to extract the relevant data for you. Then you can index this data 
> with the indexing filter you wrote.
> 
> Hope this helps.

> Kind regards,
> Martina
> 
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: BELLINI ADAM [mailto:mbel...@msn.com] 
> Gesendet: Dienstag, 22. September 2009 23:08
> An: nutch-user@lucene.apache.org
> Betreff: RE: DC metadata
> 
> 
> any idea guys ! i'm just stuck here :(
> 
> mbel...@msn.com
> 
> 
> 
> 
> From: mbel...@msn.com
> To: nutch-user@lucene.apache.org
> Subject: RE: DC metadata
> Date: Fri, 18 Sep 2009 14:12:35 +0000
> 
> 
> 
> 
> 
> 
> 
> 
> hi again 
> 
> i just copied the directory of my new plugin 'which contains the jar file and 
> the plugin.xml' to the nutch/plugins directory , and when i index now it 
> gives me this error :
> 
> 2009-09-18 10:03:44,754 WARN  mapred.LocalJobRunner - job_local_0024
> java.lang.IllegalArgumentException: it doesn't make sense to have a field 
> that is neither indexed nor stored
>         at org.apache.lucene.document.Field.<init>(Field.java:279)
>         at 
> org.apache.nutch.indexer.lucene.LuceneWriter.createLuceneDoc(LuceneWriter.java:133)
>         at 
> org.apache.nutch.indexer.lucene.LuceneWriter.write(LuceneWriter.java:239)
>         at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:54)
>         at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
>         at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
>         at 
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:158)
>         at 
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
> 
> 
> should i write a parser plugin too ??
> 
> thx
> 
> 
> 
> From: mbel...@msn.com
> To: nutch-user@lucene.apache.org
> Subject: DC metadata
> Date: Thu, 17 Sep 2009 18:30:23 +0000
> 
> 
> 
> 
> 
> 
> 
> 
> hi,
> i'm trying to add Dublingcode metadata to my index, i wrote the plugin as 
> descriped at http://wiki.apache.org/nutch/CreateNewFilter
> 
> and i build the project using ant...
> but when crawled my intranet i can't find the DoublingCode metadata in my 
> index ??
> did i missunderstand something ?
> 
> thx
>                                         
> Windows Live helps you keep up with all your friends,  in one place.          
>                           
> We are your photos. Share us now with  Windows Live Photos.                   
>                   
> _________________________________________________________________
> Create a cool, new character for your Windows LiveT Messenger. 
> http://go.microsoft.com/?linkid=9656621
                                          
_________________________________________________________________
Attention all humans. We are your photos. Free us.
http://go.microsoft.com/?linkid=9666046

RE: AW: DC metadata

Reply via email to