On 10/24/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
> Do you think there is anything about the Java implementation that
> could be improved in this regard so that the difference is not so
> dramatic? What is the C code optimizing that the Java is not?
> Surely we could bring the Java implementation close to C level speed
> in terms of I/O, no?


Just a quick answer because I want to tackle your OS X problem. Basically I
think this is just something that C really excels at. I haven't profiled
Lucene yet but I'm guessing a lot of the time is taking in the read and
write byte methods. Basically just because these methods are called so many
times. I could be very wrong about this, I'm expert on the JVM, but I think
that while Java is able to optimize the implementation of things like
sorting and hashing, the sheer number of simple instructions are just going
to be a lot quicker in C.

So that wasn't very helpful to you, but one area that could be improved is
memory management. Obviously I had to keep a pretty close eye on it in C and
it was definitely the most difficult part of porting to C. Anyway, I found a
few places where Lucene could be a little more frugal with the memory. For
example, the TermEnum creates a new Term object for each term as you skip
through. One could save a lot of memory and time by doing the comparisons
against the TermBuffer object, instead of creating a new object. This is
just an example (and it's also something I need to fix in Ferret).

While indexing speed is certainly very important, in many (most?)
> projects the searching speed is the main concern and indexing speed
> is of much less concern.
>

Definitely agreed. But a lot of the search speed is also influenced by how
fast the indexer can read the indexes. So by speeding up the indexing
module, you'll get a lot of gains in search speed too. I wouldn't want to
rewrite the search functionality in C as that as the part of the library
that people will want to extend. (Same goes for Analysis). And it is in a
lot more flux than the indexer.

Reply via email to