4) According to the PyLucene author, converting Java Lucene to a
native library with gcj, then calling that library from a C
program is hopelessly hairy and not recommended. Too bad.
5) To my eye, Nutch does not look particularly rich in features or
configurability compared to HtDig.
True, but it's known 200+ million document scalability can't be beat.
And it's being supported by Yahoo Labs.
6) Word on the street is Xapian is the only competition to Lucene
in terms of scalability in terms of Free Software search
cores. Gmane uses Xapian against 20+ million documents.
Xapian is GPL, Lucene/CLucene is LGPL. Evidently the Xapian people
didn't read the 4th paragraph of http://www.gnu.org/philosophy/why-not-lgpl.html
Using the ordinary GPL is not advantageous for every library. There are
reasons that can make it better to use the Library GPL in certain cases.
The most common case is when a free library's features are readily
available for proprietary software through other alternative libraries. In that
case, the library cannot give free software any particular advantage, so it is
better to use the Library GPL for that library.
Of course you (Jeff) and I dissagreed on this point a while back ;-)
That said Xapian does look impressive.
Anyway, I'm delighted to hear about this HtDig/Lucene experiment.
Points #1, #2, and #3 suggest it may make sense to consider the idea
of a pure Java HtDig which can be gcj compiled to native executables. From
my perspective as a naive HtDig user I think that would rock, but
there's probably lots of stuff I'm not thinking about. If anyone wants
to try out the gcj/Lucene thing Doug Cutting's instructions [*] work
fine provided you have gcj 3.4.x installed.
If we really wanted a pure Java HtDig, I think we'd be better off
throwing in with Nutch and adding the configurability of HtDig to it.
As I see it, the primary reason that Nutch is somewhat unattractive to
the average HtDig user is that they must know how to configure Nutch to run as
Tomcat service, or know how to tweak the build system to build as a
standalone server. Either is easy for a more novice user given their
current build system and 'How-To' docs.
HtDig is still a forked CGI app, which means that our
users don't have to worry about starting/monitoring a server daemon. If
we were to throw in with Nutch at some future date, it would be nice to
make a simple option for Nutch to be built as a forked CGI app.
I've looked at attempting to go the PyLucene route and compile Java with
gjc and create the hairy wrapper libs for it. It is ugly for many
reasons.
Going with CLucene at first has the advantage that we can get the
code reorg done, and look at replacing the CLucene APIs with the
equivalent Java-Lucene+Wrapper ones.. if it is even worth doing that.
Thanks.
--
Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485
---
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag--drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
___
ht://Dig Developer mailing list:
htdig-dev@lists.sourceforge.net
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev