Yes, you'd have to write a mini newsgroup reader, mimic its behaviour, but then 
once you grab a post you could send it directly to Solr for indexing.  No need 
for intermediate DB, XML files, etc.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




________________________________
From: John Martyniak <[EMAIL PROTECTED]>
To: [email protected]
Sent: Thursday, November 20, 2008 4:12:10 PM
Subject: Re: Indexing News groups

Yes by newsgroups I mean Usenet newsgroups.  So if it was a SOLR approach I 
would still have to write the crawler, or are you suggesting building a mini 
newsgroup reader that would just suck in all of the recent articles to a 
database or XML file, for SOLR to index?

I am pretty new to Nutch so still getting used to the terminology and 
functionality.

I liked the flash back, isn't web.archive awesome!!

-John

On Nov 20, 2008, at 4:03 PM, Otis Gospodnetic wrote:

> By newsgroups do you mean Usenet newsgroups?  If so, it might be a lot 
> simpler to use Solr, unless you want to build an "NNTP crawler"
> 
> I did do something like that over a decade ago.  I used it to find people and 
> build a White Pages directory (this was big in the 90s :) called POPULUS: 
> http://web.archive.org/web/*/http://www.populus.net/
> 
> Hmm, was that really a social network? :)
> http://web.archive.org/web/19970430081213/www.populus.net/populus/search-by-interest.shtml
> 
> Sorry for the OTism.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> 
> ________________________________
> From: John Martyniak <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, November 19, 2008 6:05:15 PM
> Subject: Indexing News groups
> 
> Does anybody know of a good way to index newsgroups using Nutch?  Basically 
> would like to build a searchable list of newsgroup content.
> 
> Any help would be greatly appreciated.
> 
> -John

Reply via email to