Thanks for investigating this Matthew. There shouldn't really be any repercussions to increasing this within reason, though I would like to find a way to remove this code if we can.
Does anyone know if clucene REALLY wants a wchar_t buffer, and if so, what EXACTLY does it want? wchar_t on windows is 16 bits, and on linux is typically 32 bits. This would mean that likely it expects UTF-16??? Or maybe just limits to 16 bit characters and doesn't support the full Unicode range (at least on windows)? We have methods to convert to both UTF-16 and UTF-32 in our engine, which don't need a fixed length buffer, so I would like to replace: lucene_utf8towcs(wcharBuffer, content, MAX_CONV_SIZE); with a call to our code, if we can nail down exactly what clucene wants in the resultant wcharBuffer Anyway, for now, upping the buffer should be fine, or dynamically allocating to say 2*source length should also be practically safe, but some of our module drivers support a 4 byte size, so retaining a static buffer with a fixed size would mean we'd need to make it fairly large to support the full range of data. -Troy. PS. I just typed my last command and looked at my history... scr...@scribe-laptop:~/src/sword/src/modules$ svn blame swmodule.cpp > blame scr...@scribe-laptop:~/src/sword/src/modules$ vi blame scr...@scribe-laptop:~/src/sword/src/modules$ rm blame ... and felt an all encompassing Love and acceptance, being reminded of what our God has done for us when I type: rm blame and solidly pressed return. :) Matthew Talbert wrote: > The problem is more universal and serious than I originally thought. > SWORD indexed search is performing rather poorly against BT's. At any > rate, the biggest issue is the size given to MAX_CONV_SIZE in > swmodule.cpp. Here are some tests: > > //default value of 2047 > ./search Finney "good" | wc > [0=================================50===============================100] > ====================================================================== > 22 255 1549 > > > //MAX_CONV_SIZE = 10000 > ./search Finney "good" | wc > [0=================================50===============================100] > ====================================================================== > 51 576 3573 > > //MAX_CONV_SIZE = 15000 > ./search Finney "good" | wc > [0=================================50===============================100] > ====================================================================== > 56 650 3985 > > But even 15000 isn't high enough to get all occurrences of words at > the end of long text sections. For Finney, a value of 20000 is > probably required and it's entirely possible that other modules would > require higher values. > > I don't know what the consequences are of changing this value, but > currently we're missing a huge number of hits in genbook modules and a > substantial number of hits in commentaries as well. > > If this gets fixed, I think searching and results should be added to > the test suite. It would be simple to add; just run mkfastmod, then > the search program (it would be nice to be able to change the search > type without re-compiling so that different search types could be > done). > > Matthew > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page