Hi everybody,I am quite a newbie to nutch so I do apologize if my questions are "lame" ;-)
What am I trying to accomplish (unsuccessfully) is to acquire info on crawled pages by URL. I would like to be able to get:
1. list of crawled pages 2. crawled content by URL (of course if page is crawled successfully)How can we achieve this? I would appreciate if someone more proficient would point us where to look.
We are using Nutch 0.8x with hadoop dfs on multiple machines Thanks in advance, Wojtek
