Getting contents of crawled pages by URL

Hi everybody,

I am quite a newbie to nutch so I do apologize if my questions are"lame" ;-)

What am I trying to accomplish (unsuccessfully) is to acquire info oncrawled pages by URL. I would like to be able to get:

1. list of crawled pages
2. crawled content by URL (of course if page is crawled successfully)

How can we achieve this? I would appreciate if someone more proficientwould point us where to look.


We are using Nutch 0.8x with hadoop dfs on multiple machines

Thanks in advance,
Wojtek

Reply via email to