Bruce Momjian said: >> >Yes, my worry is that someone will use a multibyte character that the >> >system sees as several bytes and enters CSV mode. >> > >> >> >> How about if we specify it explicitly, like BINARY, instead of it >> being implied by the length of DELIMITER? >> >> COPY a FROM stdin CSV DELIMITER ',"'; >> >> That would make the patch somewhat more extensive, but maybe not >> hugely more invasive (I tried to keep it as uninvasive as possible). >> I could do that, I think. > > That's what I was wondering. Is triggering CSV for multi-character > delimiters a little too clever? This reminds me of the use of LIMIT > X,Y with no indication which is limit and which is offset. > > We certainly could code to prevent the multibyte problem I mentioned, > but should we?
I confess that in my anglocentric world I have remained lamentably ignorant of how MBCS works. Just reading up a little, and looking over some of our code (e.g. the scanner) it looks like the simple solution would be to check that the delimiter was 8-bit clean. (I assume that ASCII is a subset of every MBCS we support - is that correct?) However ... > > I am thinking just: > >> COPY a FROM stdin WITH CSV ',"'; > > or > >> COPY a FROM stdin WITH DELIMITER "," QUOTE '"' EQUOTE '"'; > > EQUOTE for embedded quote. These are used in very limited situations > and don't have to be reserved words or anything. > > I can help with these changes if folks like them. > I prefer either the first, because it ensures things are specified together. If you want to do that I will work on some regression tests. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]