On 10/24/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > Do you think there is anything about the Java implementation that > could be improved in this regard so that the difference is not so > dramatic? What is the C code optimizing that the Java is not? > Surely we could bring the Java implementation close to C level speed > in terms of I/O, no?
Just a quick answer because I want to tackle your OS X problem. Basically I think this is just something that C really excels at. I haven't profiled Lucene yet but I'm guessing a lot of the time is taking in the read and write byte methods. Basically just because these methods are called so many times. I could be very wrong about this, I'm expert on the JVM, but I think that while Java is able to optimize the implementation of things like sorting and hashing, the sheer number of simple instructions are just going to be a lot quicker in C. So that wasn't very helpful to you, but one area that could be improved is memory management. Obviously I had to keep a pretty close eye on it in C and it was definitely the most difficult part of porting to C. Anyway, I found a few places where Lucene could be a little more frugal with the memory. For example, the TermEnum creates a new Term object for each term as you skip through. One could save a lot of memory and time by doing the comparisons against the TermBuffer object, instead of creating a new object. This is just an example (and it's also something I need to fix in Ferret). While indexing speed is certainly very important, in many (most?) > projects the searching speed is the main concern and indexing speed > is of much less concern. > Definitely agreed. But a lot of the search speed is also influenced by how fast the indexer can read the indexes. So by speeding up the indexing module, you'll get a lot of gains in search speed too. I wouldn't want to rewrite the search functionality in C as that as the part of the library that people will want to extend. (Same goes for Analysis). And it is in a lot more flux than the indexer.
