Hi Nutch People,

I am a developer of CLucene, which is a full C++ port of Apache
Lucene. I would like to propose something to users of Nutch:

I have been working on some SWIG wrappers for CLucene in various
higher-level languages such as C#,Java and COM. I started working on
the Java wrapper for the purpose of 'stealing' Java test suites for
the purpose of testing CLucene.

I have already managed to run about half of the luceneDotNet tests
successfully using the CLucene-csharp bindings (the rest can mostly
not be done because of the lack of director support in the Swig Csharp
module). This has been useful in tracking down bugs, etc.

Without too much effort, I have managed to get the Java bindings
working. I have so far been able to get the IndexFiles demo program to
run with very few changes to the Java code (I had to change the
imports code and put a System.loadLibrary call in - though these
differences would eventually be able to be removed completely).

I only spent a minute looking at speeds, but I indexed a directory
which took 2.5 seconds on java lucene and the same thing took 1.5
seconds in clucene-java. Of course this is not saying much, but it
means that clucene-java *might* be faster.

So what I wanted to propose to users and developer of Nutch this: with
a bit of effort, clucene-java could be good enough to be 'dropped
into' the nutch project thereby speeding up the nutch indexer. We
could write directors for clucene-java which would pass off some
things like the analysers into java. This would be beneficial to nutch
because of the added speed. If the clucene-java wrapper was written
well, there would be no need for any code change in nutch, aside from
changing which lucene jar file is loaded.

This is just some preliminary thoughts, I'm sure there is still a lot
to think about. But I have shown that the concept could work using the
demo files and I think that it could give nutch indexing/search a
reasonable speed boost.

What do people think? I am prepared to nut out this one with whoever
is interested

cheers,
ben


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to