Hi Nutch People, I am a developer of CLucene, which is a full C++ port of Apache Lucene. I would like to propose something to users of Nutch:
I have been working on some SWIG wrappers for CLucene in various higher-level languages such as C#,Java and COM. I started working on the Java wrapper for the purpose of 'stealing' Java test suites for the purpose of testing CLucene. I have already managed to run about half of the luceneDotNet tests successfully using the CLucene-csharp bindings (the rest can mostly not be done because of the lack of director support in the Swig Csharp module). This has been useful in tracking down bugs, etc. Without too much effort, I have managed to get the Java bindings working. I have so far been able to get the IndexFiles demo program to run with very few changes to the Java code (I had to change the imports code and put a System.loadLibrary call in - though these differences would eventually be able to be removed completely). I only spent a minute looking at speeds, but I indexed a directory which took 2.5 seconds on java lucene and the same thing took 1.5 seconds in clucene-java. Of course this is not saying much, but it means that clucene-java *might* be faster. So what I wanted to propose to users and developer of Nutch this: with a bit of effort, clucene-java could be good enough to be 'dropped into' the nutch project thereby speeding up the nutch indexer. We could write directors for clucene-java which would pass off some things like the analysers into java. This would be beneficial to nutch because of the added speed. If the clucene-java wrapper was written well, there would be no need for any code change in nutch, aside from changing which lucene jar file is loaded. This is just some preliminary thoughts, I'm sure there is still a lot to think about. But I have shown that the concept could work using the demo files and I think that it could give nutch indexing/search a reasonable speed boost. What do people think? I am prepared to nut out this one with whoever is interested cheers, ben ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
