Yes, you'd have to write a mini newsgroup reader, mimic its behaviour, but then once you grab a post you could send it directly to Solr for indexing. No need for intermediate DB, XML files, etc.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ________________________________ From: John Martyniak <[EMAIL PROTECTED]> To: [email protected] Sent: Thursday, November 20, 2008 4:12:10 PM Subject: Re: Indexing News groups Yes by newsgroups I mean Usenet newsgroups. So if it was a SOLR approach I would still have to write the crawler, or are you suggesting building a mini newsgroup reader that would just suck in all of the recent articles to a database or XML file, for SOLR to index? I am pretty new to Nutch so still getting used to the terminology and functionality. I liked the flash back, isn't web.archive awesome!! -John On Nov 20, 2008, at 4:03 PM, Otis Gospodnetic wrote: > By newsgroups do you mean Usenet newsgroups? If so, it might be a lot > simpler to use Solr, unless you want to build an "NNTP crawler" > > I did do something like that over a decade ago. I used it to find people and > build a White Pages directory (this was big in the 90s :) called POPULUS: > http://web.archive.org/web/*/http://www.populus.net/ > > Hmm, was that really a social network? :) > http://web.archive.org/web/19970430081213/www.populus.net/populus/search-by-interest.shtml > > Sorry for the OTism. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > ________________________________ > From: John Martyniak <[EMAIL PROTECTED]> > To: [email protected] > Sent: Wednesday, November 19, 2008 6:05:15 PM > Subject: Indexing News groups > > Does anybody know of a good way to index newsgroups using Nutch? Basically > would like to build a searchable list of newsgroup content. > > Any help would be greatly appreciated. > > -John
