I was just thinking also that I would want to go to an intermediay in case I have to re-index, if the existing infex was corrupted. Or wanted to create a new filter of some kind.

-John


On Nov 20, 2008, at 4:23 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

Yes, you'd have to write a mini newsgroup reader, mimic its behaviour, but then once you grab a post you could send it directly to Solr for indexing. No need for intermediate DB, XML files, etc.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




________________________________
From: John Martyniak <[EMAIL PROTECTED]>
To: [email protected]
Sent: Thursday, November 20, 2008 4:12:10 PM
Subject: Re: Indexing News groups

Yes by newsgroups I mean Usenet newsgroups. So if it was a SOLR approach I would still have to write the crawler, or are you suggesting building a mini newsgroup reader that would just suck in all of the recent articles to a database or XML file, for SOLR to index?

I am pretty new to Nutch so still getting used to the terminology and functionality.

I liked the flash back, isn't web.archive awesome!!

-John

On Nov 20, 2008, at 4:03 PM, Otis Gospodnetic wrote:

By newsgroups do you mean Usenet newsgroups? If so, it might be a lot simpler to use Solr, unless you want to build an "NNTP crawler"

I did do something like that over a decade ago. I used it to find people and build a White Pages directory (this was big in the 90s :) called POPULUS: http://web.archive.org/web/*/http://www.populus.net/

Hmm, was that really a social network? :)
http://web.archive.org/web/19970430081213/www.populus.net/populus/search-by-interest.shtml

Sorry for the OTism.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




________________________________
From: John Martyniak <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, November 19, 2008 6:05:15 PM
Subject: Indexing News groups

Does anybody know of a good way to index newsgroups using Nutch? Basically would like to build a searchable list of newsgroup content.

Any help would be greatly appreciated.

-John

Reply via email to