You might be interested in my www.bigdatadrupal.com
On Jun 18, 2014 5:27 AM, "Vishal Tomar" <[email protected]> wrote:

> Hi,
>
> I am new to apache nutch and web crawlers in general, I am trying to build
> a vertical search engine for real estate.
>
> Now, How do I implement the crawler? Probably use Nutch for the crawling
> and modify it to only extract links from a page if the page contents are
> relevant to real estate. I'd probably need to write some kind of relevancy
> scoring function which uses a mixture of keywords, ontology and some kind
> of similarity detection based on sites I know to be relevant.
>
> Now is there any way by which I can configure Nutch to use my relevancy
> scoring function or do I need to change the source code, Also I would
> prefer working in python over java as I am much more familiar with it, so
> is there any library in python for nutch.
>
> Apart from this I would really appreciate any more pointers regarding nutch
> in general.
>
> Thanks
> Vishal
>

Reply via email to