Hi Doug, Nutch is not really meant for this type of stuff. You'd be using a very very massive hammer for a very small nail if you were to choose Nutch for this task. :)
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Doug Leeper <douglee...@yahoo.com> > To: nutch-user@lucene.apache.org > Sent: Tuesday, December 23, 2008 12:04:51 PM > Subject: Spider a single url and get tokenzied keyword/phrases > > > I am in need to spider a given url and obtain a list of tokenized > keyword/phrases. However, I don't want to have this information indexed > into Nutch as we have our own DB storage. > > Does Nutch (or Lucene) provide this functionality with their API's. If so, > is there a working example. Nothing jumped out at me in the test files of > Nutch's distribution. > > We are currently using a product by http://www.extractor.com > www.extractor.com but the licensing is too restrictive to upgrade. > Therefore we are searching for alternatives. > > Thanks in advance... > - Doug > -- > View this message in context: > http://www.nabble.com/Spider-a-single-url-and-get-tokenzied-keyword-phrases-tp21147864p21147864.html > Sent from the Nutch - User mailing list archive at Nabble.com.