Hi Lourival
Thanks, I see, I undertstand it now. I know metatags in html, but I can't use
it, because I want to crawl pages from others sites. I think categorize the
pages by url, with regular expressions.
muito obrigado! e até mais...
;)
Ernesto.
Lourival Júnior <[EMAIL PROTECTED]> escribió: Hi Ernesto!
Meta tags are custom tags that you add in your web page, to be more
exactly, inside the tag, to identify the contents of the
web page to search engine indexes. For example your can add meta tag to
describe the author of the page, keywords, cache, and so on. What you can do
for your problem is add a meta tag to describe your categories:
I hope I helped you.
Regards
On 8/22/06, Ernesto De Santis wrote:
>
> Thanks to both for response me!
>
> What's a meta tag?
> It's some thing of nutch, it isn't a lucene field?
>
> I suppose that implementing IndexFilter.filter:
>
> filter(Document doc, Parse parse, UTF8 url, CrawlDatum datum, Inlinks
> inlinks)
>
> I can add my field to a doc instance.
>
> Well, seems that the way is to try, to crash, and to try again... :)
>
> Thanks,
> Ernesto.
>
> Chris Stephens escribió:
> > You can't do it unless you write a plugin to parse a custom meta tag
> > called category.
> >
> > I'm trying to do something like this now, but the plugin documentation
> > is horrible.
> >
> > Lourival Júnior wrote:
> >> Hi Ernesto!
> >>
> >> I know what you mean. Sometimes I get no answers too. Unfortunately,
> >> I'm new
> >> in nutch and lucene and I can't help you. Continue trying, the
> >> comunity will
> >> help you :).
> >>
> >> On 8/22/06, Ernesto De Santis wrote:
> >>>
> >>> Hi All
> >>>
> >>> Please, some body can answer my questions?
> >>> I'm a nutch beginner, I hope that my questions/doubts are easy... ;)
> >>>
> >>> Or if my email is wrong, tell me. Or confirm me if I'm in the right
> >>> way.
> >>>
> >>> Thanks a lot!
> >>> Ernesto.
> >>>
> >>> Ernesto De Santis escribió:
> >>> > Hi
> >>> >
> >>> > I'm new in nutch, start yesterday.
> >>> > But I have experience with Lucene.
> >>> >
> >>> > I have some questions for you, a nutch experts... ;)
> >>> >
> >>> > I want to split my pages results in categories, to filter or to show
> >>> > its separately.
> >>> > This is my approach:
> >>> >
> >>> > *crawl/index*
> >>> >
> >>> > I want to index an extra field.
> >>> > Then, I need to do my own plugin for that, to develop my custom
> >>> logic.
> >>> > Then, I config my plugin in conf/nutch-site.xml.
> >>> >
> >>> > To develop my plugin, I see that I need to implements: Configurable
> >>> > <
> >>>
> http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/conf/Configurable.html
> >>>
> >>> >,
> >>> > IndexingFilter
> >>> > <
> >>>
> http://lucene.apache.org/nutch/apidocs-0.8/org/apache/nutch/indexer/IndexingFilter.html
> >>>
> >>> >,
> >>> > and Pluggable
> >>> > <
> >>>
> http://lucene.apache.org/nutch/apidocs-0.8/org/apache/nutch/plugin/Pluggable.html
> >>>
> >>> >interfaces.
> >>> >
> >>> > Add to the Document instance the field value, category value.
> >>> >
> >>> > *search*
> >>> >
> >>> > Here I have a doubt, one way is set to nutch query a requiredTerm:
> >>> >
> >>> > query.addRequiredTerm(myCategory, "category");
> >>> >
> >>> > I see that nutch use QueryFilters too, but I can't see how I do hook
> >>> > it to my query.
> >>> >
> >>> > *miscellaneous*
> >>> >
> >>> > Lucene has a rich query hierarchy, I don't see it in nutch. I don't
> >>> > see BooleanQuery, TermQuery, etc. The unique point to build the
> query
> >>> > in nutch is the Query class?
> >>> >
> >>> > Lucene searcher has a way to seperate the query to the filters. The
> >>> > queries conditions affect the rank, and filters don't. How nutch
> >>> > separates it?
> >>> >
> >>> > *documentation*
> >>> >
> >>> > I read the documentation in nutch site, tutorial, wiki,
> presentations
> >>> > and today.java.net article:
> >>> >
> >>>
> http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html
> >>>
> >>> > and part2 too.
> >>> >
> >>> > A lot of details aren't covered there. Some body know more detailed
> >>> > documentation?
> >>> >
> >>> > Thanks a lot.
> >>> > Ernesto.
> >>> >
> >>>
> >>>
> >>>
> >>>
> >>> __________________________________________________
> >>> Preguntá. Respondé. Descubrí.
> >>> Todo lo que querías saber, y lo que ni imaginabas,
> >>> está en Yahoo! Respuestas (Beta).
> >>> ¡Probalo ya!
> >>> http://www.yahoo.com.ar/respuestas
> >>>
> >>>
> >>
> >>
> >
> >
> >
>
>
>
>
> __________________________________________________
> Preguntá. Respondé. Descubrí.
> Todo lo que querías saber, y lo que ni imaginabas,
> está en Yahoo! Respuestas (Beta).
> ¡Probalo ya!
> http://www.yahoo.com.ar/respuestas
>
>
--
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: [EMAIL PROTECTED]
---------------------------------
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
Probalo ya! -------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general