Hi All, I am new to nutch, and not able to figure out simple configurations for me. Also I am not finding much help from the web.
Here is my simple requirement for which I am thinking to use nutch: I have several topics in a forum like http://www.coderanch.com/forums/f-15/Performance then inside every topic there are users who participated in the discussion http://www.coderanch.com/t/615478/Performance/java/Code-quality-plugins-eclipse I want to crawl all users name and their page, in above example, it would be something like name : Navneet Sharma profile : http://www.coderanch.com/forums/user/profile/277769 name: soundar rajan profile:http://www.coderanch.com/forums/user/profile/283096 and say, I want to crawl only their Ranking, Number of messages and Registration date. So, I am only interested in this much data name ranking number_of_message registration_date how can I achieve it in nutch? And also, If I can tell nutch by any means not to crawl unnecessary links other than these? Please mention which version works best with Windows also, because I had issue with 1.7, some file permission with hadoop, but working well with nutch 1.2. Any help would be highly appreciated. Regards Harshvardhan Ojha

