Hi, I need tons of HTML pages to do a research. I followed the tutorial in the wiki page and setup a nutch-1.4 crawler (without solr). I can now dump the extracted text from the segments, unfortunately the HTML tags are stripped. How can I retrieve the original HTML pages from the crawled database? or are the original HTML pages actually stored by nutch?
Thanks -Yao

