I want to customize the crawling process by modifying the way pages are
stored. As far as I know, Nutch will stored web pages in binary file, page
by page. After a link analysis step, Nutch will crawl to the destination
page and download it. When pages are stored, I want to write only link to a
different text/binary file with the structure in the example below
E.g. Assuming that page A has link to page B, C and we number them 1, 2 and
3. I want to write my file as
1 2 (Enter for a new line)
1 3
and etc.
How can I do this with Nutch? Please provide me  some hints. Thank you very
much.

--
NamNH
-------------------------------------------
Contacts
             Cell 0912500501
             Office 8581530

Reply via email to