> Since clucene isn't aiming for either UTF-16 OR UTF-32, I don't
> believe you'll be able to. A better approach would be to get the size
> of "content" and set a value based on that.

FWIW, I just did this for both searching and index creating. I
over-allocated the length by 500, which is overkill. I don't really
think it needs to be overallocated at all (content will be UTF-8 which
will always have at least one char per character, whereas we're
converting to USC2 or USC4 which both have exactly one per character,
so using the length exactly should always work, I think). Anyway, the
result is a 3 second decrease in indexing time for ESV, which is
fairly substantial I think. I'd definitely recommend doing something
like this.

Matthew

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to