John, Ok, we will try it with your latest changes and then reply back and let you know how it goes. Thanks much,
Rob On Oct 26, 2010, at 8:46 AM, John Darrington wrote: > On Tue, Oct 26, 2010 at 10:40:29AM +0000, John Darrington wrote: > On Mon, Oct 25, 2010 at 07:51:56PM -0700, Ben Pfaff wrote: > Rob Messer <rmes...@intellisurvey.com> writes: > >> What is the current status of support for including UTF-8 characters >> in PSPP output? My company is using the Perl interface to import >> survey data into PSPP, and generally it works very well. However, >> we've never been able to use it when our dataset includes labels and >> records in languages like Japanese and Chinese. I know there have >> been some recent updates to PSPP, so last week we upgraded to 0.7.5 >> and tried that, but it still didn't seem to work for our test Japanese >> and Chinese data. Is it supposed to be supported? And if not in >> 0.7.5, perhaps in the latest development snapshot? Thanks, > > John Darrington and I talked about this briefly in IRC this > morning. We didn't know a reason that UTF-8 shouldn't work. > > I had another look today and have to modify my opinion. Currently, > non-ascii > characters will not work with the perl module. :( > > > OK. I've just pushed a quick fix which should address this problem. I > tested this > new version writing UTF8 strings in: > > Variable Names; > Variable Labels; > Value Labels (both the key and the value); > Values of string variables. > > > So now, assuming you have a string variable defined, you can write a string > value using an literal utf8 string like: > > # German word for "Cylindrical concrete billboard" > $sysfile->append_case ( ["LitfaÃsaüle"]);]); > > or using escape sequences like: > > # The Chinese representation of the name of the city of Tapei > $sysfile->append_case ( ["\x{53F0}\x{5317}"]); > > > However, in most real life uses, I image you will not be using string > literals, > but will be receiving the data from some other perl module. In this case, > what > needs to be done is : > > use Encode; > > $s = get_string_data_from_some_source (); > $enc = get_encoding_of_string_data (); > > $sysfile->append_case ([decode ($enc, $s)]); > > > As always with i18n things are never without caveats... in particular: > > * You must remember that a variable's "width" is the maximum number of BYTES > (not characters). > > > * For rather convoluted reasons, which you need to read "man Encode" in order > to understand, the code ... > > use utf8; > use Encode; > > $sysfile->append_case ([decode ('UTF-8', "some-utf8-encoded-string")]); > > .... won't work. Instead, you would have to write: > > $sysfile->append_case ([decode ('UTF-8', encode ('UTF-8', > "some-utf8-encoded-string"))]); > > > I haven't had a chance to look at reading non-ascii from a .sav file into > perl. > > J' > > > > > -- > PGP Public key ID: 1024D/2DE827B3 > fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 > See http://pgp.mit.edu or any PGP keyserver for public key. > > _______________________________________________ Pspp-users mailing list Pspp-users@gnu.org http://lists.gnu.org/mailman/listinfo/pspp-users