I think the pagewill help you much http://wiki.apache.org/nutch/WritingPluginExample-0.9
Alex 2008/11/12 kevin pang <[EMAIL PROTECTED]> > Cool, > > Thanks for your reply. > I want to know which point will this extension extends? > thanks in advance. > > 2008/11/10 Cool The Breezer <[EMAIL PROTECTED]> > > > Create a new Nutch extension to add a new field to Document which > contains > > all text for all links available in a page. Take a look at NekoHTML or > > HTMLParser documents and get all links of any page. And extract texts for > > all links. Then add a new field to nutch document. > > > > I had same kind of requirement to get all image URLs from page and add > them > > as a new field in Nutch document. I have used htmlparser to extract all > > images and converted the URLs as comma separated text and added them as > a > > new field in index. > > > > - RB > > > > > > --- On Sun, 11/9/08, kevin pang <[EMAIL PROTECTED]> wrote: > > > > > From: kevin pang <[EMAIL PROTECTED]> > > > Subject: how to crawl all the urls in the page > > > To: [email protected] > > > Date: Sunday, November 9, 2008, 9:28 PM > > > i want to crawl all the urls in the page including those > > > display as text,not > > > just as hyper link, how to add this rule into nutch fetcher > > > ? > > > anyone can help ? much appriciated. > > > > > > Regards, > > > > > > > > > -- Best Regards Alexander Aristov
