To add onto an already clear explanation (a comment on precision in Stata). Indeed Stata stores all numbers as floats (also known as single precision or 4-byte reals). One way you could check this is to save a small subset of your data with all numbers as doubles in stata and see how that size of the new Stata file compares with the new file you create in R
(A section on this can be found in the Stata user manual 13.10) Thomas Lumley wrote: > On Mon, 16 Jan 2006, Dimitri Joe wrote: > >> >> (i) I get a big R file (for example, a 15Mb Stata file became a 42Mb R >> file; after cleanup.import() from the Hmisc package, it drooped to 35Mb, >> but that's still more than 2x the original Stata file) which, in turn, I >> suspect is due the fact that >> >> (ii) factors are created using Stata labels as levels. > > Your suspicion is wrong. > > A more likely explanation is that Stata uses single-precision floating > point by default and can use 1-byte and 2-byte integers. R uses double > precision floating point and four-byte integers. > > >> I wonder if >> >> (i) there isn't a way of forcing each variable to be numeric or integer, >> maintaining it's original values (instead of "Stata labels" as "R >> levels"). Or, > > Yes. If you read the help page for read.dta() it tells you how. > > -thomas > > Thomas Lumley Assoc. Professor, Biostatistics > [EMAIL PROTECTED] University of Washington, Seattle > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
