For the Solr part you could use any of the solr client libraries (e.g. solrj) to feed data to Solr. For getting data from an NNTP server you could either write your own threaded app or I presume it would be an MR job with N tasks.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: John Martyniak <[EMAIL PROTECTED]> > To: [email protected] > Sent: Thursday, November 20, 2008 4:34:54 PM > Subject: Re: Indexing News groups > > That doesn't sound like it would be to difficult to implement, might even be > able to find a shell to use as a starting point. Do you think that this is > something that hadoop would be good for, specifically the distributed nature, > > or would it be better to build a standard threaded Java app. > > So it sounds like there is an API or would this be through a REST Interface > or > some sort of webservice? > > -John > > On Nov 20, 2008, at 4:23 PM, Otis Gospodnetic wrote: > > > Yes, you'd have to write a mini newsgroup reader, mimic its behaviour, but > then once you grab a post you could send it directly to Solr for indexing. > No > need for intermediate DB, XML files, etc. > > > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > > > ________________________________ > > From: John Martyniak > > To: [email protected] > > Sent: Thursday, November 20, 2008 4:12:10 PM > > Subject: Re: Indexing News groups > > > > Yes by newsgroups I mean Usenet newsgroups. So if it was a SOLR approach I > would still have to write the crawler, or are you suggesting building a mini > newsgroup reader that would just suck in all of the recent articles to a > database or XML file, for SOLR to index? > > > > I am pretty new to Nutch so still getting used to the terminology and > functionality. > > > > I liked the flash back, isn't web.archive awesome!! > > > > -John > > > > On Nov 20, 2008, at 4:03 PM, Otis Gospodnetic wrote: > > > >> By newsgroups do you mean Usenet newsgroups? If so, it might be a lot > simpler to use Solr, unless you want to build an "NNTP crawler" > >> > >> I did do something like that over a decade ago. I used it to find people > >> and > build a White Pages directory (this was big in the 90s :) called POPULUS: > http://web.archive.org/web/*/http://www.populus.net/ > >> > >> Hmm, was that really a social network? :) > >> > http://web.archive.org/web/19970430081213/www.populus.net/populus/search-by-interest.shtml > >> > >> Sorry for the OTism. > >> > >> Otis > >> -- > >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >> > >> > >> > >> > >> ________________________________ > >> From: John Martyniak > >> To: [email protected] > >> Sent: Wednesday, November 19, 2008 6:05:15 PM > >> Subject: Indexing News groups > >> > >> Does anybody know of a good way to index newsgroups using Nutch? > >> Basically > would like to build a searchable list of newsgroup content. > >> > >> Any help would be greatly appreciated. > >> > >> -John
