I was just thinking also that I would want to go to an intermediay in
case I have to re-index, if the existing infex was corrupted. Or
wanted to create a new filter of some kind.
-John
On Nov 20, 2008, at 4:23 PM, Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:
Yes, you'd have to write a mini newsgroup reader, mimic its
behaviour, but then once you grab a post you could send it directly
to Solr for indexing. No need for intermediate DB, XML files, etc.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
________________________________
From: John Martyniak <[EMAIL PROTECTED]>
To: [email protected]
Sent: Thursday, November 20, 2008 4:12:10 PM
Subject: Re: Indexing News groups
Yes by newsgroups I mean Usenet newsgroups. So if it was a SOLR
approach I would still have to write the crawler, or are you
suggesting building a mini newsgroup reader that would just suck in
all of the recent articles to a database or XML file, for SOLR to
index?
I am pretty new to Nutch so still getting used to the terminology
and functionality.
I liked the flash back, isn't web.archive awesome!!
-John
On Nov 20, 2008, at 4:03 PM, Otis Gospodnetic wrote:
By newsgroups do you mean Usenet newsgroups? If so, it might be a
lot simpler to use Solr, unless you want to build an "NNTP crawler"
I did do something like that over a decade ago. I used it to find
people and build a White Pages directory (this was big in the
90s :) called POPULUS: http://web.archive.org/web/*/http://www.populus.net/
Hmm, was that really a social network? :)
http://web.archive.org/web/19970430081213/www.populus.net/populus/search-by-interest.shtml
Sorry for the OTism.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
________________________________
From: John Martyniak <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, November 19, 2008 6:05:15 PM
Subject: Indexing News groups
Does anybody know of a good way to index newsgroups using Nutch?
Basically would like to build a searchable list of newsgroup content.
Any help would be greatly appreciated.
-John