Re: [sword-devel] indexed search discrepancy

Matthew Talbert Fri, 28 Aug 2009 14:44:30 -0700

>> We have methods to convert to both UTF-16 and UTF-32 in our engine,
>> which don't need a fixed length buffer, so I would like to replace:
>>
>> lucene_utf8towcs(wcharBuffer, content, MAX_CONV_SIZE);
>>
>> with a call to our code, if we can nail down exactly what clucene wants
>> in the resultant wcharBuffer


lucene_utf8towcs calls lucene_utf8towc for every character; the
comment on the function is this:

/**
 * lucene_utf8towc:
 * @p: a pointer to Unicode character encoded as UTF-8
 *
 * Converts a sequence of bytes encoded as UTF-8 to a Unicode character.
 * If @p does not point to a valid UTF-8 encoded character, results are
 * undefined. If you are not sure that the bytes are complete
 * valid Unicode characters, you should use lucene_utf8towc_validated()
 * instead.
 *
 * Return value: the resulting character
 **/

The call to doc->Add actually expects a TCHAR, so if your utf8 to
utf16 conversion can produce a TCHAR, then that's all that would be
necessary I think.

Matthew

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] indexed search discrepancy

Reply via email to