Re: [sqlite] Odd insertion error FTS4 + ICU (E. Timothy Uy)

2012-06-20 Thread Klaas Van Be
>> >> inserting the following into my virtual table: >> >> >> >> 一日耶羅波安出 >> >> Can you post the list of codepoints in this text? Or the hex >> of the utf-16 or utf-8 encoding of the same? 00 4E E5 65 36 80 85 7F E2 6C 89 5B FA 51 Here no problem inserting this string (Mac OSX 10.6.8) sqlite>

Re: [sqlite] Odd insertion error FTS4 + ICU

2012-06-19 Thread Dan Kennedy
On 06/19/2012 04:28 AM, E. Timothy Uy wrote: > Dear Dan, > > With the change from U8_NEXT to U16_NEXT, I am able to insert 一日耶羅波安出. I > was also able to insert the rest of the data set (about 31000 more rows > containing both traditional and simplified Chinese). Is this an ICU error? > Seems like

Re: [sqlite] Odd insertion error FTS4 + ICU

2012-06-18 Thread E. Timothy Uy
Dear Dan, With the change from U8_NEXT to U16_NEXT, I am able to insert 一日耶羅波安出. I was also able to insert the rest of the data set (about 31000 more rows containing both traditional and simplified Chinese). Is this an ICU error? Seems like everything should be using U8_ in the tokenizer. Thank

Re: [sqlite] Odd insertion error FTS4 + ICU

2012-06-18 Thread E. Timothy Uy
I'll take a look right now. Though my first thought was if you change U8_NEXT to U16_NEXT, wouldn't you have to change it everywhere else? I recompiled ICU with U_CHARSET_IS_UTF8 earlier and this did not help. On Mon, Jun 18, 2012 at 2:06 PM, Dan Kennedy wrote: > On

Re: [sqlite] Odd insertion error FTS4 + ICU

2012-06-18 Thread Dan Kennedy
On 06/19/2012 03:39 AM, E. Timothy Uy wrote: > If anyone can unravel this mystery, it would be much appreciated. For now, > I inserted a comma - 一日、耶羅波安出 and it works. I suspect it must be somehow > that the sequence of bytes encodes another character, which throws the > tokenizer out of whack or

Re: [sqlite] Odd insertion error FTS4 + ICU

2012-06-18 Thread E. Timothy Uy
If anyone can unravel this mystery, it would be much appreciated. For now, I inserted a comma - 一日、耶羅波安出 and it works. I suspect it must be somehow that the sequence of bytes encodes another character, which throws the tokenizer out of whack or maybe the fts4aux table. 一 19968 %E4%B8%80 日 26085

Re: [sqlite] Odd insertion error FTS4 + ICU

2012-06-18 Thread E. Timothy Uy
Thanks for writing back Dan. Using charCodeAt() in Javascript, I have the following for 一日耶羅波安出: 19968 26085 32822 32645 27874 23433 20986 I tried entering subsets of the data: 一日耶羅波安出 - Error: SQL logic error or missing database <-- target 一日耶羅波安 - Ok 日耶羅波安出 - Ok 耶羅波安出 - Ok 一日耶羅波安出x - Error:

Re: [sqlite] Odd insertion error FTS4 + ICU

2012-06-18 Thread Dan Kennedy
On 06/19/2012 02:11 AM, E. Timothy Uy wrote: > I recompiled ICU using U_CHARSET_IS_UTF8 and the error persists. > > On Mon, Jun 18, 2012 at 11:45 AM, E. Timothy Uy wrote: > >> Hopefully someone has some insight on this. I am using FTS4 with >> tokenize=icu (and PRAGMA

Re: [sqlite] Odd insertion error FTS4 + ICU

2012-06-18 Thread E. Timothy Uy
I recompiled ICU using U_CHARSET_IS_UTF8 and the error persists. On Mon, Jun 18, 2012 at 11:45 AM, E. Timothy Uy wrote: > Hopefully someone has some insight on this. I am using FTS4 with > tokenize=icu (and PRAGMA encoding="UTF-8"). I'm getting getting an error > inserting the