Dear R-devel list, This is to confirm Prof. Ripley's analysis of the read.xport issue.
The section on missing data in TS140 is pertinent to numeric variables only. In SAS, character variables are of fixed length (between 1 and 200 for the xport format). Shorter strings are padded with trailing blanks when assigned to a variable. An uninitialized character variable is stored as all blanks in the xport format file. This is the only representation of 'missing' data for SAS character variables. 'Special missing' codes (.A to .Z and ._) are available for numeric variables only. Please find enclosed a patch to the R-2.0.1/src/library/Recommended/foreign/SASxport.c file and a xport file that I used for testing. The xport file was created by SAS V8.2 on Linux, but should be plattform and version independent (except for the header information). I have simply commented out the code lines that try to detect missing character values. The code in SASxport.c already does a good job in removing trailing blanks from character values. For missing character data (all blanks) the result is the empty string (""), which is fine for me. There is no equivalent to the R missing character representation in SAS (as far as I know). The enclosed gzipped tar file contains: diff_SASxport_c.txt diff for SASxport.c xptchar1.xpt test file in xport format xptchar.sas trivial SAS program used to generate xptchar1.xpt xptchar_SAS_System_Viewer9_1.csv xptchar1.xpt converted to comma separated file using SAS System Viewer 9.1 (on Win XP) With the patch applied, read.xport produces the same data frame from xptchar1.xpt as read.csv does from xptchar_SAS_System_Viewer9_1.csv (tested on i386 Linux with R Version 2.0.1) except that read.csv converts empty strings to NAs. As explained above, the empty string is closer to the meaning of an all-blanks value in SAS. There is renewed interest in this old data format in the pharmaceutical industry, because the US Food and Drug Administration requests clinical and pre-clinical data to be submitted in this format. I spent some time analyzing the xport file format to be sure of what is actually submitted to FDA with these files. Thank you for considering this patch (and for the great R system, of course)! Best regards, Werner Engl _____________________________________ Werner Engl, PhD, CStat Senior Manager, Biostatistics Baxter AG, Vienna, Austria e-mail: [EMAIL PROTECTED] --- Please disregard any text below this line --- -- GMX DSL-Netzanschluss + Tarif zum supergünstigen Komplett-Preis!
PR7389_we20041209.tar.gz
Description: GNU Zip compressed data
______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel