Thanks for reply, I will configure this in regex-urlfilter.xml file.

Regards
Harshvardhan Ojha


On Wed, Oct 23, 2013 at 11:34 PM, A Laxmi <[email protected]> wrote:

> Hi..
>
> You can specify URL regular expression in the conf/regex-urlfilter file to
> just accept only the links that matches regular expression of profiles
> URLs.
>
> I recommend not to use Windows for this (unless someone have a different
> opinion). May be you can try to install a virtual manager like Hyper-v and
> install Ubuntu OS on it and go from there.
>
>
>
>
>
>
> On Wed, Oct 23, 2013 at 1:57 PM, Harshvardhan Ojha <
> [email protected]> wrote:
>
> > Hi All,
> >
> > I am new to nutch, and not able to figure out simple configurations for
> me.
> > Also I am not finding much help from the web.
> >
> > Here is my simple requirement for which I am thinking to use nutch:
> >
> > I have several topics in a forum like
> > http://www.coderanch.com/forums/f-15/Performance
> >
> > then inside every topic there are users who participated in the
> discussion
> >
> >
> http://www.coderanch.com/t/615478/Performance/java/Code-quality-plugins-eclipse
> >
> > I want to crawl all users name and their page, in above example, it would
> > be something like
> >
> > name : Navneet Sharma
> > profile : http://www.coderanch.com/forums/user/profile/277769
> >
> > name: soundar rajan
> > profile:http://www.coderanch.com/forums/user/profile/283096
> >
> > and say, I want to crawl only their
> > Ranking, Number of messages and Registration date.
> >
> > So, I am only interested in this much data
> >
> > name
> > ranking
> > number_of_message
> > registration_date
> >
> > how can I achieve it in nutch? And also, If I can tell nutch by any means
> > not to crawl unnecessary links other than these?
> >
> > Please mention which version works best with Windows also, because I had
> > issue with 1.7, some file permission with hadoop, but working well with
> > nutch 1.2.
> >
> > Any help would be highly appreciated.
> >
> > Regards
> > Harshvardhan Ojha
> >
>

Reply via email to