John Darrington <[email protected]> writes: > On Sat, Oct 16, 2010 at 04:27:27PM -0700, Ben Pfaff wrote: > John Darrington <[email protected]> writes: > > > I've no doubt this patch is an improvement. > > However, I'm worried about how this is going to work with non-ascii > encodings. > > For example some recent syntax files that I've seen have UTF-8 "hard" > spaces > > (0xc2 0x0a) instead of the normal ' '. > > Does SPSS actually treat a "hard space" as white space? Looking > at the C, Java, and XML standards, none of them appear to treat > hard spaces as white space; it appears to be rejected as invalid. > > I can only really answer that with a question: "What do you mean by `treat as > whitespace'"? > > Based upon syntax file examples that I have found on the web, it certainly > appears to be true that a hard space is interpreted as a keyword seperator in > syntax.
OK. I guess we can presume that SPSS accepts code points 0xa0 and 0x20 as equivalent in syntax then. That's too bad--all of the C, Java, and XML white space characters are code points below 0x80. Since the other characters that are valid parts of command names are also below 0x80, this code could have essentially ignored UTF-8 encoding if SPSS did not treat 0xa0 as white space. Oh well. > However other questions remain. For example some comands (eg: > AUTORECODE /MISSING ) alter their behaviour when "blank" string > values are encountered. I don't know exactly what it means for > a string value to be "blank". If anyone can do some > experiments with spss and report the results it would be very > much appreciated. I've always had the assumption that 0x20 was the sole code point accepted as "blank" in these situations. -- Ben Pfaff http://benpfaff.org _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
