dear All,

I am using nutch for crawling all the user reviews on a page of IMDB .the
url will be
http://www.imdb.com/title/tt1375666/usercomments
http://www.imdb.com/title/tt1375666/usercomments?start=50
I want to crawl all these with only user review as text.

on each of thes url there will be link to user profile like of each user on
clicking you  will redirect to url like avoiding other urls

http://www.imdb.com/user/ur10583368/comments

which has all the movie review written by a user in this case  ur10583368
but this user could have written multiple reviews and the pattern for those
urls will be


http://www.imdb.com/user/ur10583368/comments?order=date&start=10 while
highlighted area will change for each page

Now I need all these reviews as well .

please help.
i just want to crawl only these url


-- 
Nitin Kumar Hardeniya

M.Tech Computational Linguistics
IIIT Hyderabad




-- 
Nitin Kumar Hardeniya

M.Tech Computational Linguistics
IIIT Hyderabad

Reply via email to