You probably still want to write a plugin. You can user whatever algorithms you like to identify a site category, then add that as a field in the index.
Ernesto De Santis wrote: > Hi Lourival > > Thanks, I see, I undertstand it now. I know metatags in html, but I can't use > it, because I want to crawl pages from others sites. I think categorize the > pages by url, with regular expressions. > > muito obrigado! e até mais... > ;) > Ernesto. > > Lourival Júnior <[EMAIL PROTECTED]> escribió: Hi Ernesto! > > Meta tags are custom tags that you add in your web page, to be more > exactly, inside the tag, to identify the contents of the > web page to search engine indexes. For example your can add meta tag to > describe the author of the page, keywords, cache, and so on. What you can do > for your problem is add a meta tag to describe your categories: > > > > I hope I helped you. > > Regards > > On 8/22/06, Ernesto De Santis wrote: > >> Thanks to both for response me! >> >> What's a meta tag? >> It's some thing of nutch, it isn't a lucene field? >> >> I suppose that implementing IndexFilter.filter: >> >> filter(Document doc, Parse parse, UTF8 url, CrawlDatum datum, Inlinks >> inlinks) >> >> I can add my field to a doc instance. >> >> Well, seems that the way is to try, to crash, and to try again... :) >> >> Thanks, >> Ernesto. >> >> Chris Stephens escribió: >> >>> You can't do it unless you write a plugin to parse a custom meta tag >>> called category. >>> >>> I'm trying to do something like this now, but the plugin documentation >>> is horrible. >>> >>> Lourival Júnior wrote: >>> >>>> Hi Ernesto! >>>> >>>> I know what you mean. Sometimes I get no answers too. Unfortunately, >>>> I'm new >>>> in nutch and lucene and I can't help you. Continue trying, the >>>> comunity will >>>> help you :). >>>> >>>> On 8/22/06, Ernesto De Santis wrote: >>>> >>>>> Hi All >>>>> >>>>> Please, some body can answer my questions? >>>>> I'm a nutch beginner, I hope that my questions/doubts are easy... ;) >>>>> >>>>> Or if my email is wrong, tell me. Or confirm me if I'm in the right >>>>> way. >>>>> >>>>> Thanks a lot! >>>>> Ernesto. >>>>> >>>>> Ernesto De Santis escribió: >>>>> >>>>>> Hi >>>>>> >>>>>> I'm new in nutch, start yesterday. >>>>>> But I have experience with Lucene. >>>>>> >>>>>> I have some questions for you, a nutch experts... ;) >>>>>> >>>>>> I want to split my pages results in categories, to filter or to show >>>>>> its separately. >>>>>> This is my approach: >>>>>> >>>>>> *crawl/index* >>>>>> >>>>>> I want to index an extra field. >>>>>> Then, I need to do my own plugin for that, to develop my custom >>>>>> >>>>> logic. >>>>> >>>>>> Then, I config my plugin in conf/nutch-site.xml. >>>>>> >>>>>> To develop my plugin, I see that I need to implements: Configurable >>>>>> < >>>>>> >> http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/conf/Configurable.html >> >>>>>> , >>>>>> IndexingFilter >>>>>> < >>>>>> >> http://lucene.apache.org/nutch/apidocs-0.8/org/apache/nutch/indexer/IndexingFilter.html >> >>>>>> , >>>>>> and Pluggable >>>>>> < >>>>>> >> http://lucene.apache.org/nutch/apidocs-0.8/org/apache/nutch/plugin/Pluggable.html >> >>>>>> interfaces. >>>>>> >>>>>> Add to the Document instance the field value, category value. >>>>>> >>>>>> *search* >>>>>> >>>>>> Here I have a doubt, one way is set to nutch query a requiredTerm: >>>>>> >>>>>> query.addRequiredTerm(myCategory, "category"); >>>>>> >>>>>> I see that nutch use QueryFilters too, but I can't see how I do hook >>>>>> it to my query. >>>>>> >>>>>> *miscellaneous* >>>>>> >>>>>> Lucene has a rich query hierarchy, I don't see it in nutch. I don't >>>>>> see BooleanQuery, TermQuery, etc. The unique point to build the >>>>>> >> query >> >>>>>> in nutch is the Query class? >>>>>> >>>>>> Lucene searcher has a way to seperate the query to the filters. The >>>>>> queries conditions affect the rank, and filters don't. How nutch >>>>>> separates it? >>>>>> >>>>>> *documentation* >>>>>> >>>>>> I read the documentation in nutch site, tutorial, wiki, >>>>>> >> presentations >> >>>>>> and today.java.net article: >>>>>> >>>>>> >> http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html >> >>>>>> and part2 too. >>>>>> >>>>>> A lot of details aren't covered there. Some body know more detailed >>>>>> documentation? >>>>>> >>>>>> Thanks a lot. >>>>>> Ernesto. >>>>>> >>>>>> >>>>> >>>>> >>>>> __________________________________________________ >>>>> Preguntá. Respondé. Descubrí. >>>>> Todo lo que querías saber, y lo que ni imaginabas, >>>>> está en Yahoo! Respuestas (Beta). >>>>> ¡Probalo ya! >>>>> http://www.yahoo.com.ar/respuestas >>>>> >>>>> >>>>> >>>> >>> >>> >> >> >> __________________________________________________ >> Preguntá. Respondé. Descubrí. >> Todo lo que querías saber, y lo que ni imaginabas, >> está en Yahoo! Respuestas (Beta). >> ¡Probalo ya! >> http://www.yahoo.com.ar/respuestas >> >> >> > > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
