[HACKERS] Request for review: tsearch2 patch
Hi, Here are patches against tsearch2 with CVS head. Currently tsearch2 does not work with multibyte encoding which uses C locale. These patches are intended to solve the problem by using PostgreSQL in-house multibyte function instead of mbstowcs which does not work with C locale. Also iswalpha etc. will not be called in case of C locale since they are not working with it. Tested with the EUC_JP encoding (should be working with any multibye encodings). Existing single byte encodings should not be broken by the patches, I did not test though. -- Tatsuo Ishii SRA OSS, Inc. Japan Index: ts_locale.c === RCS file: /cvsroot/pgsql/contrib/tsearch2/ts_locale.c,v retrieving revision 1.7 diff -c -r1.7 ts_locale.c *** ts_locale.c 20 Nov 2006 14:03:30 - 1.7 --- ts_locale.c 1 Jan 2007 12:22:50 - *** *** 63,68 --- 63,101 return mbstowcs(to, from, len); } + + #else /* WIN32 */ + + size_t + char2wchar(wchar_t *to, const char *from, size_t len) + { + wchar_t *result; + size_t n; + + if (to == NULL) + return 0; + + if (lc_ctype_is_c) + { + /* allocate neccesary memory for to including NULL terminate */ + result = (wchar_t *)palloc((len+1)*sizeof(wchar_t)); + + /* do the conversion */ + n = (size_t)pg_mb2wchar_with_len(from, (pg_wchar *)result, len); + if (n 0) + { + /* store the result */ + if (n len) + n = len; + memcpy(to, result, n*sizeof(wchar_t)); + pfree(result); + *(to + n) = '\0'; + } + return n; + } + return mbstowcs(to, from, len); + } + #endif /* WIN32 */ int *** *** 70,75 --- 103,113 { wchar_t character; + if (lc_ctype_is_c) + { + return isalpha(TOUCHAR(ptr)); + } + char2wchar(character, ptr, 1); return iswalpha((wint_t) character); *** *** 80,85 --- 118,128 { wchar_t character; + if (lc_ctype_is_c) + { + return isprint(TOUCHAR(ptr)); + } + char2wchar(character, ptr, 1); return iswprint((wint_t) character); *** *** 126,132 if ( wlen 0 ) ereport(ERROR, (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE), !errmsg(transalation failed from server encoding to wchar_t))); Assert(wlen=len); wstr[wlen] = 0; --- 169,175 if ( wlen 0 ) ereport(ERROR, (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE), !errmsg(translation failed from server encoding to wchar_t))); Assert(wlen=len); wstr[wlen] = 0; *** *** 152,158 if ( wlen 0 ) ereport(ERROR, (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE), !errmsg(transalation failed from wchar_t to server encoding %d, errno))); Assert(wlen=len); out[wlen]='\0'; } --- 195,201 if ( wlen 0 ) ereport(ERROR, (errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE), !errmsg(translation failed from wchar_t to server encoding %d, errno))); Assert(wlen=len); out[wlen]='\0'; } Index: ts_locale.h === RCS file: /cvsroot/pgsql/contrib/tsearch2/ts_locale.h,v retrieving revision 1.7 diff -c -r1.7 ts_locale.h *** ts_locale.h 4 Oct 2006 00:29:47 - 1.7 --- ts_locale.h 1 Jan 2007 12:22:50 - *** *** 38,45 #else /* WIN32 */ /* correct mbstowcs */ - #define char2wchar mbstowcs #define wchar2char wcstombs #endif /* WIN32 */ #define t_isdigit(x) ( pg_mblen(x)==1 isdigit( TOUCHAR(x) ) ) --- 38,46 #else /* WIN32 */ /* correct mbstowcs */ #define wchar2char wcstombs + size_tchar2wchar(wchar_t *to, const char *from, size_t len); + #endif /* WIN32 */ #define t_isdigit(x) ( pg_mblen(x)==1 isdigit( TOUCHAR(x) ) ) *** *** 54,59 --- 55,61 * t_iseq() should be called only for ASCII symbols */ #define t_iseq(x,c) ( (pg_mblen(x)==1) ? ( TOUCHAR(x) == ((unsigned char)(c)) ) :
Re: [HACKERS] A possible TODO item
Gurjeet Singh [EMAIL PROTECTED] writes: I thought that if the author of the code (which, now I see, is you) No, it was Jan IIRC. And surely we are never going to make VACUUM force a complete REINDEX as the comment suggests. In that case, can the comment be changed! Even though it's a poor implementation suggestion, at least it's an implementation suggestion. I'm disinclined to remove it when I don't have a better idea to put in its place ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] effective_cache_size vs units
TL == Tom Lane [EMAIL PROTECTED] writes: TL Personally I don't find the argument about someday we might want TL to support measurements in millibits to be convincing at all, and TL certainly it seems weaker than the argument that units should be TL case insensitive because everything else in this file is. The SQL TL spec has to be considered a more relevant controlling precedent TL for us than the SI units spec, and there are no case-sensitive TL keywords in SQL. Units simply are not case sensitive. They are just a more or less random collection of preexisting symbols, because that was easier than drawing up entirely new ones. Not all are English letters, for one µ is not. If you upper case a text with units in, the units do not change with the rest of the text. /Benny ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] effective_cache_size vs units
Benny Amorsen [EMAIL PROTECTED] writes: TL == Tom Lane [EMAIL PROTECTED] writes: TL Personally I don't find the argument about someday we might want TL to support measurements in millibits to be convincing at all, and TL certainly it seems weaker than the argument that units should be TL case insensitive because everything else in this file is. The SQL TL spec has to be considered a more relevant controlling precedent TL for us than the SI units spec, and there are no case-sensitive TL keywords in SQL. Units simply are not case sensitive. They are just a more or less random collection of preexisting symbols, because that was easier than drawing up entirely new ones. Not all are English letters, for one µ is not. You mean are case sensitive right? This is not news. The point I'm basically making is that it's not going to hurt us to restrict GUC to supporting a subset of all-possible-units that can be treated case-insensitively. We're already going to restrict the allowed character set: I can guarantee you that µ, or anything else outside 7-bit ASCII, will never be accepted. It's just not worth the trouble of dealing with multiple possible encodings. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Status of Fix Domain Casting TODO
I'm wondering if Gevik has had any time for further work on http://archives.postgresql.org/pgsql-hackers/2006-09/msg01738.php ? FWIW, I'm running into this trying to create a 'raw' domain that would automagically convert hex strings into actual binary data for storage in a bytea. My intention was to use that as the basis for an 'md5data' domain (unfortunately, calling the domain simply 'md5' results in a conflict with the built-in md5 function). So something to consider on the domain casting is the case of casting from domain A to domain B to a base type. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Reverse-sort indexes and NULLS FIRST/LAST sorting
The SQL2003 spec adds optional NULLS FIRST and NULLS LAST modifiers for ORDER BY clauses. Teodor proposed an implementation here: http://archives.postgresql.org/pgsql-patches/2006-12/msg00019.php which I didn't care for at all: http://archives.postgresql.org/pgsql-hackers/2006-12/msg00133.php Doing this right is going to require introducing the nulls-first-or-last concept into all the system's handling of sort ordering. Messy as that sounds, I think it will end up logically cleaner than what we have now, because it will let us fix some issues involving descending-order index opclasses and backwards-sort mergejoins. Neither of those can really work correctly right now, the reason being exactly that we lack a framework for dealing with variable sort positioning of NULLs. I'm hoping to fix this as a consequence of the work I'm doing with operator families for 8.3. What I'd like to come out of it is support for both NULLS FIRST/LAST and reverse-sort index columns. Reverse-sort indexes are already in the TODO list, the application being to create an index whose sort order matches a query like ORDER BY x ASC, y DESC. There are some user-visible decisions to be made first, so this message is to start a discussion about what we want. One way we could handle this is to say that reverse-sort indexes are implemented by adding explicit catalog entries for reverse-sort opclasses, with no additions to the underlying btree index mechanisms. So you might make an index using a command like CREATE INDEX fooi ON foo (x, y reverse_int4_ops); btree indexes would always sort by the given opclass with NULLS LAST. So the two possible orderings that could be derived from this index (using forward or backward scan respectively) are ORDER BY x ASC NULLS LAST, y DESC NULLS LAST ORDER BY x DESC NULLS FIRST, y ASC NULLS FIRST The other way that seems like it could win acceptance is to make REVERSE an explicit optional property of an index column; and if we do that we might as well allow NULLS FIRST/LAST to be an optional property as well. Then you could say something like CREATE INDEX fooi ON foo (x, y REVERSE NULLS FIRST); (Or maybe use DESC instead of REVERSE as the keyword --- not very important at this point.) This index would support scans with these two sort orderings: ORDER BY x ASC NULLS LAST, y DESC NULLS FIRST ORDER BY x DESC NULLS FIRST, y ASC NULLS LAST This second way is more flexible in that it allows indexes to support mixed null orderings; another attraction is that we'd not have to create explicit reverse-sort opclasses, which would be a tedious bit of extra work for every datatype. On the other hand, adding these extra flag bits to indexes seems like a horribly ugly wart, mainly because they're irrelevant to anything except a btree index. (Or at least irrelevant to anything that doesn't support ordered scans, but in practice that's only btree for the foreseeable future.) Also, having to account for these options in the btree code would make it more complex and perhaps slower. Comments? I've got mixed feelings about which way to jump myself. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] TODO: GNU TLS
[EMAIL PROTECTED] (Joshua D. Drake) writes: The reason I wanted to use PGP is that I already have a PGP key. X.509 certificates are far too complicated (a certificate authority is a useless extra step in my case). Complete side note but one feature that I brought up to my team a potentially useful would be to allow the use of ssh keys for authentication. SSH keys are far more prevalent, and they are understood even at the medium corporate level. I haven't discussed this with Afilias folk, but that sure sounds like an excellent thing to me. ssh keys are already in widespread use for other forms of authentication; this seems an excellent re-use. X.509 might be nice, too, eventually; ssh keys would be immediately useful. -- cbbrowne,@,linuxfinances.info http://cbbrowne.com/info/sap.html Evil Overlords tend to get overthrown due to overly baroque plans with obvious fatal errors. Follow the Rules of the Evil Overlord, and you need not fear heroic opposition, whether that hero be James Bond, Flash Gordon, or a little hobbit named Frodo. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Status of Fix Domain Casting TODO
Jim C. Nasby [EMAIL PROTECTED] writes: FWIW, I'm running into this trying to create a 'raw' domain that would automagically convert hex strings into actual binary data for storage in a bytea. I think you've got 0 chance of implementing that as a domain rather than an independent type. Without or without revisions in the casting rules, a domain has not got its own I/O functions, and never will. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] 8.3 pending patch queue
I will start processing the patches held for 8.3 this week or next, now that the holiday break is over: http://momjian.postgresql.org/cgi-bin/pgpatches_hold -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 8.3 pending patch queue
Bruce Momjian wrote: I will start processing the patches held for 8.3 this week or next, now that the holiday break is over: http://momjian.postgresql.org/cgi-bin/pgpatches_hold Some of these look obsolete. Also, . the plperl out params patch needs substantial rework by its author, IMHO. . there is a new version of the enums patch that has been submitted. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Status of Fix Domain Casting TODO
Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: FWIW, I'm running into this trying to create a 'raw' domain that would automagically convert hex strings into actual binary data for storage in a bytea. I think you've got 0 chance of implementing that as a domain rather than an independent type. Without or without revisions in the casting rules, a domain has not got its own I/O functions, and never will. This might be less of an issue if we allowed such IO functions to be written in a loadable PL rather than in C. cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] float8 width_bucket function
I came across this when looking through the patches_hold queue link that Bruce sent out. http://momjian.us/mhonarc/patches_hold/msg00162.html There is no patch or anything associated with it, just the suggestion that it be put in when 8.3 devel starts up. Just thought I'd put this back out there now that 8.3 devel has started, since I had just about forgotten about it until seeing it on that list... -- Putt's Law: Technology is dominated by two types of people: Those who understand what they do not manage. Those who manage what they do not understand. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 8.3 pending patch queue
Andrew Dunstan wrote: Bruce Momjian wrote: I will start processing the patches held for 8.3 this week or next, now that the holiday break is over: http://momjian.postgresql.org/cgi-bin/pgpatches_hold Some of these look obsolete. Also, . the plperl out params patch needs substantial rework by its author, IMHO. . there is a new version of the enums patch that has been submitted. Yes, I will have to go through it, find the valuable ones, and get comments. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] New version of money type
D'Arcy J.M. Cain darcy@druid.net writes: I changed this and a few other things. I didn't see any response to my question though. Shall I go ahead and commit now so that we can test in a wider setting? I haven't committed anything in years and I am hesitant to do so now without consencus. FWIW, as long as you responded to my coding-style criticisms I don't have any problem with your committing it. Perhaps someone else does... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings