Does nutch serve my purpose?

KishoreKumar Bairi Fri, 30 May 2008 01:22:41 -0700

Hi,


I'm new to nutch. I don't even know if this serves my purpose.

I'm working on a machine learning problem for which I need corpus.can be
obtained by crawling web. (required dataset is not available.)
but my requirements are as follows:

Crawler should crawl links of only certain pattern, (www.domain.com/id)
it should fetch only specific data from the page crawled(instead of entire
content of page).
  say <div id="reqd1"></div> from page of pattern1 and <div
id="reqd2"></div> from page of pattern2.  then merge both. that will be
one    example data I require.Like wise I need few hundreds/thousands of
pages(examples).

And finally all the fetched text should be store in some kind of
database/XML files, So that I can use it for training my program.

Please can any one tell me, Is nutch the right choice for me? If not what
would be the best method to accomplish my task?


Regards,
KishoreKumar.

Does nutch serve my purpose?

Reply via email to