Cool, Thanks for your reply. I want to know which point will this extension extends? thanks in advance.
2008/11/10 Cool The Breezer <[EMAIL PROTECTED]> > Create a new Nutch extension to add a new field to Document which contains > all text for all links available in a page. Take a look at NekoHTML or > HTMLParser documents and get all links of any page. And extract texts for > all links. Then add a new field to nutch document. > > I had same kind of requirement to get all image URLs from page and add them > as a new field in Nutch document. I have used htmlparser to extract all > images and converted the URLs as comma separated text and added them as a > new field in index. > > - RB > > > --- On Sun, 11/9/08, kevin pang <[EMAIL PROTECTED]> wrote: > > > From: kevin pang <[EMAIL PROTECTED]> > > Subject: how to crawl all the urls in the page > > To: [email protected] > > Date: Sunday, November 9, 2008, 9:28 PM > > i want to crawl all the urls in the page including those > > display as text,not > > just as hyper link, how to add this rule into nutch fetcher > > ? > > anyone can help ? much appriciated. > > > > Regards, > > > >
