You might be interested in my www.bigdatadrupal.com On Jun 18, 2014 5:27 AM, "Vishal Tomar" <[email protected]> wrote:
> Hi, > > I am new to apache nutch and web crawlers in general, I am trying to build > a vertical search engine for real estate. > > Now, How do I implement the crawler? Probably use Nutch for the crawling > and modify it to only extract links from a page if the page contents are > relevant to real estate. I'd probably need to write some kind of relevancy > scoring function which uses a mixture of keywords, ontology and some kind > of similarity detection based on sites I know to be relevant. > > Now is there any way by which I can configure Nutch to use my relevancy > scoring function or do I need to change the source code, Also I would > prefer working in python over java as I am much more familiar with it, so > is there any library in python for nutch. > > Apart from this I would really appreciate any more pointers regarding nutch > in general. > > Thanks > Vishal >

