I did a url-category-indexer. It works with a .properties file that map urls writed as regexp and categories. example:
http://www.misite.com/videos/.*=videos If it seems useful, I can share it. Maybe, it could be better config it in a .xml file. Regards, Ernesto. Stefan Neufeind escribió: > Alvaro Cabrerizo wrote: > >> Have you included a node to describe your new searcher filter into >> plugin.xml? >> >> 2006/10/11, xu nutch <[EMAIL PROTECTED]>: >> >>> I have a question about myplugin for indexfilter and queryfilter. >>> Can u Help me ! >>> ------------------------------------- >>> MoreIndexingFilter.java in add >>> doc.add(new Field("category", "test", false, true, false)); >>> ------------------------------------- >>> >>> -------------------------------------- >>> >>> >>> package org.apache.nutch.searcher.more; >>> >>> import org.apache.nutch.searcher.RawFieldQueryFilter; >>> >>> /** Handles "category:" query clauses, causing them to search the >>> field indexed by >>> * BasicIndexingFilter. */ >>> public class CategoryQueryFilter extends RawFieldQueryFilter { >>> public CategoryQueryFilter() { >>> super("category"); >>> } >>> } >>> ----------------------------------------------- >>> ----------------------------------------------- >>> >>> <property> >>> <name>plugin.includes</name> >>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value> >>> >>> <description>Regular expression naming plugin directory names to >>> include. Any plugin not matching this expression is excluded. >>> In any case you need at least include the nutch-extensionpoints >>> plugin. By >>> default Nutch includes crawling just HTML and plain text via HTTP, >>> and basic indexing and search plugins. >>> </description> >>> </property> >>> >>> <property> >>> <name>plugin.includes</name> >>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value> >>> >>> <description>Regular expression naming plugin directory names to >>> include. Any plugin not matching this expression is excluded. >>> In any case you need at least include the nutch-extensionpoints >>> plugin. By >>> default Nutch includes crawling just HTML and plain text via HTTP, >>> and basic indexing and search plugins. >>> </description> >>> </property> >>> ----------------------------------------------- >>> >>> I use luke to query "category:test" is ok! >>> but I use tomcat webstie to query "category:test" , >>> no return result. >>> > > In case you get the search working: > How do you plan to categorize URLs/sites? I'm looking for a solution > there, since I didn't yet manage to implement something > URL-prefix-filter based to map categories to URLs or so. > > > Regards, > Stefan > > > __________________________________________________ Preguntá. Respondé. Descubrí. Todo lo que querías saber, y lo que ni imaginabas, está en Yahoo! Respuestas (Beta). ¡Probalo ya! http://www.yahoo.com.ar/respuestas ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
