Even though the FDA has no policies at all that limit our choices of statistical
software, there is one defacto standard in place: reliance of the SAS transport file
format for data submission (even though this format is deficient for this purpose,
e.g., it does not even document value labels or units of measurement in a
self-contained way). Because of the widespread use of SAS transport files in the
pharmaceutical industry, clinical trial data analyses done by statistical centers like
ours who receive data from companies often begin with SAS transport files. I have not
had SAS on my machines in about 12 years so it would be nice to be able to read binary
transport files instead of having to run the slower sas.get function in the Hmisc
library. sas.get has to launch SAS to do its work.
The foreign package implements a quick way to read such files in its read.xport
function. This function has some significant problems which I have reported to the
developers some time ago but fixes do not seem to be forthcoming nor have
acknowledgements of the bug report. The developers have done great work in writing
the foreign package (and many other awesome contributions to the community) so I don't
fault them at all for being creative, busy people. I am writing this note to see if
any C language-savvy R users have done their own fixes or would be willing to help the
developers with these particular fixes. The specific problems I have found are (1) a
worrisome one in which reasonable but invalid data result from importing SAS numeric
variables of length 3 bytes; and (2) getting corrupted files when the SAS transport
file contains multiple SAS datasets. In addition, it would be great to have
lookup.xport retrieve all SAS variable attributes including PROC FORMAT VALU!
E names, so that factor variables could be created as is done automatically with
read.spss in foreign. Note there is also a problem with lookup.xport when there are
multiple files. The documentation states that a list with a major element for each
dataset will be created. read.xport is supposed to create a list of data frames for
this case.
Here is SAS code I used to create test files, followed by R output.
libname x SASV5XPT "test.xpt";
libname y SASV5XPT "test2.xpt";
PROC FORMAT; VALUE race 1=green 2=blue 3=purple; RUN;
PROC FORMAT CNTLOUT=format;RUN;
data test;
LENGTH race 3 age 4;
age=30; label age="Age at Beginning of Study";
race=2;
d1='3mar2002'd ;
dt1='3mar2002 9:31:02'dt;
t1='11:13:45't;
output;
age=31;
race=4;
d1='3jun2002'd ;
dt1='3jun2002 9:42:07'dt;
t1='11:14:13't;
output;
format d1 mmddyy10. dt1 datetime. t1 time. race race.;
run;
/* PROC CPORT LIB=work FILE='test.xpt';run; * no; */
PROC COPY IN=work OUT=x;SELECT test;RUN;
PROC COPY IN=work OUT=y;SELECT test format;RUN;
> lookup.xport('test.xpt')
$TEST
$TEST$headpad
[1] 1200
$TEST$type
[1] "numeric" "numeric" "numeric" "numeric" "numeric"
$TEST$width
[1] 3 4 8 8 8
$TEST$index
[1] 1 2 3 4 5
$TEST$position
[1] 0 3 7 15 23
$TEST$name
[1] "RACE" "AGE" "D1" "DT1" "T1"
$TEST$sexptype
[1] 14 14 14 14 14
$TEST$tailpad
[1] 18
$TEST$length
[1] 2
> lookup.xport('test2.xpt')
Same output except tailpad=76, length=124, second dataset ignored.
> read.xport('test.xpt')
RACE AGE D1 DT1 T1
1 2.000063 30.00000 15402 1330767062 40425
2 4.000063 31.00000 15494 1338716527 40453
> read.xport('test2.xpt')
RACE AGE D1 DT1 T1
1 2.000063e+00 3.000000e+01 1.540200e+04 1.330767e+09 4.042500e+04
2 4.000063e+00 3.100000e+01 1.549400e+04 1.338717e+09 4.045300e+04
. . . .
122 3.687825e-40 3.687825e-40 3.687825e-40 5.868918e-40 3.687825e-40
123 5.904941e-40 2.942346e+63 9.068390e+43 NA -5.524256e-48
124 3.835229e-93 6.434447e-86 NA 3.687825e-40 3.687825e-40
test.xpt and test2.xpt may be retrieved from
http://hesweb1.med.virginia.edu/biostat/tmp
They were created on an IBM AIX machine running SAS 8.
Thanks very much for any assistance. -Frank
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
______________________________________________
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help