> > I have tested with local-enabled environment and found a bug. Included > > is the new version of patches. > Your patch causes crash on tsearch2's installcheck with 'initdb -E UTF8 > --locale > C', simple way to reproduce: > # select to_tsquery('default', '''New York'''); > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed.
It seems it's a bug with original tsearch2. Here is the patches. ------------------------------------------------------------------ *** wordparser/parser.c~ 2007-01-07 09:54:39.000000000 +0900 --- wordparser/parser.c 2007-01-11 10:33:41.000000000 +0900 *************** *** 51,57 **** if (prs->charmaxlen > 1) { prs->usewide = true; ! prs->wstr = (wchar_t *) palloc(sizeof(wchar_t) * prs->lenstr); prs->lenwstr = char2wchar(prs->wstr, prs->str, prs->lenstr); } else --- 51,57 ---- if (prs->charmaxlen > 1) { prs->usewide = true; ! prs->wstr = (wchar_t *) palloc(sizeof(wchar_t) * (prs->lenstr+1)); prs->lenwstr = char2wchar(prs->wstr, prs->str, prs->lenstr); } else ------------------------------------------------------------------ > >> ! static int p_isalnum(TParser *prs) { > ... > >> ! if (lc_ctype_is_c()) > >> ! { > >> ! if (c > 0x7f) > >> ! return 1; > > I have some some doubts that any character greater than 0x7f is an alpha > symbol. > Is it simple assumption or workaround? Yeah, it's a workaround. Since there's no concept other than alpha/numeric/latin in tsearch2, Asian characters have to be fall in one of them. -- Tatsuo Ishii SRA OSS, Inc. Japan ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend