If you want to develop scrapers, I suggest you take a look at jsoup (
http://jsoup.org/), which allows you to parse HTML easily. If you need
subsequent classification of the websites, then maybe you'll need Mahout

On Mon, Jul 30, 2012 at 2:26 PM, Sean Owen <[email protected]> wrote:

> Extract as in web crawl? No it's nothing to do with that.
> Extract as in entity extraction? I don't think there are relevant
> implementations here either, though that begins to border on machine
> learning.
> This is more about clustering and classification of documents than anything
> else.
>
> On Mon, Jul 30, 2012 at 1:22 PM, David Rose <[email protected]> wrote:
>
> > Hi all,
> >
> > I  apologize for how basic my question is, but I am very new to all of
> > this, machine learning, writing code, all of it.  I was finally able to
> get
> > Mahout downloaded, installed, and running.  I was assigned a project at
> my
> > work to try to use Mahout to extract data from websites that we input.
>  Is
> > this possible? Can anyone help me with suggestions or instructions on how
> > to do so? I appreciate any help on this, as I have only two more weeks to
> > finish this project.
> >
> > Thanks,
> >
> > David Rose
>

Reply via email to