Hi all,

Currently, there is no API for google group. And now I want to get all the
post information in one google group that I am in. So I tried to use nutch
to crawl this google group, and wish to fetch all the post pages of this
google group.

I am not sure if nutch can crawl google products like google group.

This march, I successfully crawled some public pages using nutch.

And I tried to  crawl google group these days follow the authentication
tutorial: http://wiki.apache.org/nutch/HttpAuthenticationSchemes.

And I set the conf/httpclient-auth.xml like : 
<credentials username="susam" password="masus">
 <default/>
</credentials>

But I can only fetch the first page which is in url seed. I cannot fetch
those pages that contain the post information.

Do I miss anything here. Can nutch crawl websites like google group?

Thank you very much!!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/use-nutch-to-crawl-information-in-google-group-tp3985037.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to