Hi everybody,

I am quite a newbie to nutch so I do apologize if my questions are "lame" ;-)

What am I trying to accomplish (unsuccessfully) is to acquire info on crawled pages by URL. I would like to be able to get:
1. list of crawled pages
2. crawled content by URL (of course if page is crawled successfully)
How can we achieve this? I would appreciate if someone more proficient would point us where to look.

We are using Nutch 0.8x with hadoop dfs on multiple machines

Thanks in advance,
Wojtek

Reply via email to