I did a url-category-indexer.

It works with a .properties file that map urls writed as regexp and categories.
example:

http://www.misite.com/videos/.*=videos

If it seems useful, I can share it.

Maybe, it could be better config it in a .xml file.

Regards,
Ernesto.

Stefan Neufeind escribió:
Alvaro Cabrerizo wrote:
Have you included a node to describe your new searcher filter into
plugin.xml?

2006/10/11, xu nutch <[EMAIL PROTECTED]>:
I have a question about myplugin for indexfilter and queryfilter.
Can u Help me !
-------------------------------------
MoreIndexingFilter.java in add
doc.add(new Field("category", "test", false, true, false));
-------------------------------------

--------------------------------------


package org.apache.nutch.searcher.more;

import org.apache.nutch.searcher.RawFieldQueryFilter;

/** Handles "category:" query clauses, causing them to search the
field indexed by
 * BasicIndexingFilter. */
public class CategoryQueryFilter extends RawFieldQueryFilter {
 public CategoryQueryFilter() {
   super("category");
 }
}
-----------------------------------------------
-----------------------------------------------

<property>
 <name>plugin.includes</name>
<value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>

 <description>Regular expression naming plugin directory names to
 include.  Any plugin not matching this expression is excluded.
 In any case you need at least include the nutch-extensionpoints
plugin. By
 default Nutch includes crawling just HTML and plain text via HTTP,
 and basic indexing and search plugins.
 </description>
</property>

<property>
 <name>plugin.includes</name>
<value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>

 <description>Regular expression naming plugin directory names to
 include.  Any plugin not matching this expression is excluded.
 In any case you need at least include the nutch-extensionpoints
plugin. By
 default Nutch includes crawling just HTML and plain text via HTTP,
 and basic indexing and search plugins.
 </description>
</property>
-----------------------------------------------

I use luke to query "category:test" is ok!
but I use tomcat webstie to query "category:test" ,
no return result.

In case you get the search working:
How do you plan to categorize URLs/sites? I'm looking for a solution
there, since I didn't yet manage to implement something
URL-prefix-filter based to map categories to URLs or so.


Regards,
 Stefan



        
        
                
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya! http://www.yahoo.com.ar/respuestas

Reply via email to