On Mon, 16 Jan 2006, Dimitri Joe wrote:
>
> (i) I get a big R file (for example, a 15Mb Stata file became a 42Mb R
> file; after cleanup.import() from the Hmisc package, it drooped to 35Mb,
> but that's still more than 2x the original Stata file) which, in turn, I
> suspect is due the fact that
>
> (ii) factors are created using Stata labels as levels.
Your suspicion is wrong.
A more likely explanation is that Stata uses single-precision floating
point by default and can use 1-byte and 2-byte integers. R uses double
precision floating point and four-byte integers.
> I wonder if
>
> (i) there isn't a way of forcing each variable to be numeric or integer,
> maintaining it's original values (instead of "Stata labels" as "R
> levels"). Or,
Yes. If you read the help page for read.dta() it tells you how.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
[EMAIL PROTECTED] University of Washington, Seattle
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html