On Mon, Feb 05, 2018 at 10:41:04AM +0100, John Darrington wrote: > On Sun, Feb 04, 2018 at 01:57:53PM -0800, Ben Pfaff wrote: > Well, in the end the file shows two things: > > (1) PSPP should not write a file that it cannot read later: if > you look at the raw file, it contains question marks. This > means that PSPP output routines should be more careful about > insisting on writing the file in a character set that is > acceptable for later reading. > > (2) PSPP should be able to read files that do contain bad > variable names, probably by replacing unacceptable bytes by some > kind of placeholder like X. > > Here is my guess at what happened: > > The charset of the machine used to create the file is something other than > UTF-8, > and either it has been (inadvertently) set to something incabable of encoding > the umlauts he was trying to use, or the iconv library on that machine is > broken. > > So the try_recode routine in libpspp/i18n.c failed, and the fallback char, > which is '?', see line 1167 was substituted. > > What I don't understand is how the user could not have noticed something > amiss at > the time of data entry. The variable names should have been rejected when > entered, > or at least looked very wierd.
PSPP generally maintains variable names, etc. internally in UTF-8, and I wonder whether we're not checking that against the charset at all the appropriate times. I could especially see that happening in the GUI, where there's an extra layer of indirection. But in any case I think the reader and writer should be more robust. I'll work on that. _______________________________________________ Pspp-users mailing list Pspp-users@gnu.org https://lists.gnu.org/mailman/listinfo/pspp-users