Hello, I want to use nutch for website mirroring, to import starting from a remote url.
I managed already to create a program that fetches, then merges segments and reads the content of the segments. What I want to do next is: - create a local directory structure which resembles the remote structure: is there any elegant way of using the existing Nutch API to accomplish this, or I need to manually create the structure from the segments content; - convert links inside every page to relative links. For example, if a src points to "http://www.mysite.com/resources/foo.txt" I need to change that to be "/resources/foo.txt" because I want to point to the local file. My question is if I can use the crawl_parse, or parse_data to get the links. I am not sure how to do this, using the Nutch API. Thank you, Vlad

