I did a url-category-indexer.

It works with a .properties file that map urls writed as regexp and 
categories.
example:

http://www.misite.com/videos/.*=videos

If it seems useful, I can share it.

Maybe, it could be better config it in a .xml file.

Regards,
Ernesto.

Stefan Neufeind escribió:
> Alvaro Cabrerizo wrote:
>   
>> Have you included a node to describe your new searcher filter into
>> plugin.xml?
>>
>> 2006/10/11, xu nutch <[EMAIL PROTECTED]>:
>>     
>>> I have a question about myplugin for indexfilter and queryfilter.
>>> Can u Help me !
>>> -------------------------------------
>>> MoreIndexingFilter.java in add
>>> doc.add(new Field("category", "test", false, true, false));
>>> -------------------------------------
>>>
>>> --------------------------------------
>>>
>>>
>>> package org.apache.nutch.searcher.more;
>>>
>>> import org.apache.nutch.searcher.RawFieldQueryFilter;
>>>
>>> /** Handles "category:" query clauses, causing them to search the
>>> field indexed by
>>>  * BasicIndexingFilter. */
>>> public class CategoryQueryFilter extends RawFieldQueryFilter {
>>>  public CategoryQueryFilter() {
>>>    super("category");
>>>  }
>>> }
>>> -----------------------------------------------
>>> -----------------------------------------------
>>>
>>> <property>
>>>  <name>plugin.includes</name>
>>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
>>>
>>>  <description>Regular expression naming plugin directory names to
>>>  include.  Any plugin not matching this expression is excluded.
>>>  In any case you need at least include the nutch-extensionpoints
>>> plugin. By
>>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>>  and basic indexing and search plugins.
>>>  </description>
>>> </property>
>>>
>>> <property>
>>>  <name>plugin.includes</name>
>>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
>>>
>>>  <description>Regular expression naming plugin directory names to
>>>  include.  Any plugin not matching this expression is excluded.
>>>  In any case you need at least include the nutch-extensionpoints
>>> plugin. By
>>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>>  and basic indexing and search plugins.
>>>  </description>
>>> </property>
>>> -----------------------------------------------
>>>
>>> I use luke to query "category:test" is ok!
>>> but I use tomcat webstie to query "category:test" ,
>>> no return result.
>>>       
>
> In case you get the search working:
> How do you plan to categorize URLs/sites? I'm looking for a solution
> there, since I didn't yet manage to implement something
> URL-prefix-filter based to map categories to URLs or so.
>
>
> Regards,
>  Stefan
>
>
>   

        
        
                
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya! 
http://www.yahoo.com.ar/respuestas


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to