I did a url-category-indexer.
It works with a .properties file that map urls writed as regexp and
categories.
example:
http://www.misite.com/videos/.*=videos
If it seems useful, I can share it.
Maybe, it could be better config it in a .xml file.
Regards,
Ernesto.
Stefan Neufeind escribió:
Alvaro Cabrerizo wrote:
Have you included a node to describe your new searcher filter into
plugin.xml?
2006/10/11, xu nutch <[EMAIL PROTECTED]>:
I have a question about myplugin for indexfilter and queryfilter.
Can u Help me !
-------------------------------------
MoreIndexingFilter.java in add
doc.add(new Field("category", "test", false, true, false));
-------------------------------------
--------------------------------------
package org.apache.nutch.searcher.more;
import org.apache.nutch.searcher.RawFieldQueryFilter;
/** Handles "category:" query clauses, causing them to search the
field indexed by
* BasicIndexingFilter. */
public class CategoryQueryFilter extends RawFieldQueryFilter {
public CategoryQueryFilter() {
super("category");
}
}
-----------------------------------------------
-----------------------------------------------
<property>
<name>plugin.includes</name>
<value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints
plugin. By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins.
</description>
</property>
<property>
<name>plugin.includes</name>
<value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints
plugin. By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins.
</description>
</property>
-----------------------------------------------
I use luke to query "category:test" is ok!
but I use tomcat webstie to query "category:test" ,
no return result.
In case you get the search working:
How do you plan to categorize URLs/sites? I'm looking for a solution
there, since I didn't yet manage to implement something
URL-prefix-filter based to map categories to URLs or so.
Regards,
Stefan
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas