You probably still want to write a plugin.  You can user whatever 
algorithms you like to identify a site category, then add that as a 
field in the index.

Ernesto De Santis wrote:
> Hi Lourival
>
> Thanks, I see, I undertstand it now. I know metatags in html, but I can't use 
> it, because I want to crawl pages from others sites. I think categorize the 
> pages by url, with regular expressions.
>
> muito obrigado! e até mais...
> ;)
> Ernesto.
>
> Lourival Júnior <[EMAIL PROTECTED]> escribió: Hi Ernesto!
>
> Meta tags are custom tags that you add in your web page, to be more
> exactly,  inside the  tag,  to identify the contents of the
> web page to search engine indexes. For example your can add meta tag to
> describe the author of the page, keywords, cache, and so on. What you can do
> for your problem is add a meta tag to describe your categories:
>
>
>
> I hope I helped you.
>
> Regards
>
> On 8/22/06, Ernesto De Santis  wrote:
>   
>> Thanks to both for response me!
>>
>> What's a meta tag?
>> It's some thing of nutch, it isn't a lucene field?
>>
>> I suppose that implementing IndexFilter.filter:
>>
>> filter(Document doc, Parse parse, UTF8 url, CrawlDatum datum, Inlinks
>> inlinks)
>>
>> I can add my field to a doc instance.
>>
>> Well, seems that the way is to try, to crash, and to try again... :)
>>
>> Thanks,
>> Ernesto.
>>
>> Chris Stephens escribió:
>>     
>>> You can't do it unless you write a plugin to parse a custom meta tag
>>> called category.
>>>
>>> I'm trying to do something like this now, but the plugin documentation
>>> is horrible.
>>>
>>> Lourival Júnior wrote:
>>>       
>>>> Hi Ernesto!
>>>>
>>>> I know what you mean. Sometimes I get no answers too. Unfortunately,
>>>> I'm new
>>>> in nutch and lucene and I can't help you. Continue trying, the
>>>> comunity will
>>>> help you :).
>>>>
>>>> On 8/22/06, Ernesto De Santis  wrote:
>>>>         
>>>>> Hi All
>>>>>
>>>>> Please, some body can answer my questions?
>>>>> I'm a nutch beginner, I hope that my questions/doubts are easy... ;)
>>>>>
>>>>> Or if my email is wrong, tell me. Or confirm me if I'm in the right
>>>>> way.
>>>>>
>>>>> Thanks a lot!
>>>>> Ernesto.
>>>>>
>>>>> Ernesto De Santis escribió:
>>>>>           
>>>>>> Hi
>>>>>>
>>>>>> I'm new in nutch, start yesterday.
>>>>>> But I have experience with Lucene.
>>>>>>
>>>>>> I have some questions for you, a nutch experts... ;)
>>>>>>
>>>>>> I want to split my pages results in categories, to filter or to show
>>>>>> its separately.
>>>>>> This is my approach:
>>>>>>
>>>>>> *crawl/index*
>>>>>>
>>>>>> I want to index an extra field.
>>>>>> Then, I need to do my own plugin for that, to develop my custom
>>>>>>             
>>>>> logic.
>>>>>           
>>>>>> Then, I config my plugin in conf/nutch-site.xml.
>>>>>>
>>>>>> To develop my plugin, I see that I need to implements: Configurable
>>>>>> <
>>>>>>             
>> http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/conf/Configurable.html
>>     
>>>>>> ,
>>>>>> IndexingFilter
>>>>>> <
>>>>>>             
>> http://lucene.apache.org/nutch/apidocs-0.8/org/apache/nutch/indexer/IndexingFilter.html
>>     
>>>>>> ,
>>>>>> and Pluggable
>>>>>> <
>>>>>>             
>> http://lucene.apache.org/nutch/apidocs-0.8/org/apache/nutch/plugin/Pluggable.html
>>     
>>>>>> interfaces.
>>>>>>
>>>>>> Add to the Document instance the field value, category value.
>>>>>>
>>>>>> *search*
>>>>>>
>>>>>> Here I have a doubt, one way is set to nutch query a requiredTerm:
>>>>>>
>>>>>> query.addRequiredTerm(myCategory, "category");
>>>>>>
>>>>>> I see that nutch use QueryFilters too, but I can't see how I do hook
>>>>>> it to my query.
>>>>>>
>>>>>> *miscellaneous*
>>>>>>
>>>>>> Lucene has a rich query hierarchy, I don't see it in nutch. I don't
>>>>>> see BooleanQuery, TermQuery, etc. The unique point to build the
>>>>>>             
>> query
>>     
>>>>>> in nutch is the Query class?
>>>>>>
>>>>>> Lucene searcher has a way to seperate the query to the filters. The
>>>>>> queries conditions affect the rank, and filters don't. How nutch
>>>>>> separates it?
>>>>>>
>>>>>> *documentation*
>>>>>>
>>>>>> I read the documentation in nutch site, tutorial, wiki,
>>>>>>             
>> presentations
>>     
>>>>>> and today.java.net article:
>>>>>>
>>>>>>             
>> http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html
>>     
>>>>>> and part2 too.
>>>>>>
>>>>>> A lot of details aren't covered there. Some body know more detailed
>>>>>> documentation?
>>>>>>
>>>>>> Thanks a lot.
>>>>>> Ernesto.
>>>>>>
>>>>>>             
>>>>>
>>>>>
>>>>> __________________________________________________
>>>>> Preguntá. Respondé. Descubrí.
>>>>> Todo lo que querías saber, y lo que ni imaginabas,
>>>>> está en Yahoo! Respuestas (Beta).
>>>>> ¡Probalo ya!
>>>>> http://www.yahoo.com.ar/respuestas
>>>>>
>>>>>
>>>>>           
>>>>         
>>>
>>>       
>>
>>
>> __________________________________________________
>> Preguntá. Respondé. Descubrí.
>> Todo lo que querías saber, y lo que ni imaginabas,
>> está en Yahoo! Respuestas (Beta).
>> ¡Probalo ya!
>> http://www.yahoo.com.ar/respuestas
>>
>>
>>     
>
>
>   


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to