Hi Mike,

There are some facilities for storing and manipulating small (2 bit) integers. See here:

http://cran.r-project.org/web/packages/ff/index.html

-Matt

On 04/14/2011 01:20 PM, Mike Miller wrote:
I note that "current implementations of R use 32-bit integers for
integer vectors," but I am working with large arrays that contain
integers from 0 to 3, so they could be stored as unsigned 8-bit
integers. Can R do this? (FYI -- This is for storing minor-allele counts
for genetic studies. There are 0, 1 or 2 minor alleles and 3 would
represent missing.)

It is theoretically possible to store such data with four integers per
byte. This is what PLINK (GPL license) does in its binary (.bed)
pedigree format:

http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped

That might be too much to hope for. ;-)

I think that the R system uses double-precision floating point numbers
by default. When I impute minor-allele counts, I get posterior expected
values ranging from 0 to 2 (called dosages). The imputation isn't very
precise, so it would be fine to store such data using one or two bytes.
(The values are used as regressors and small changes would have minimal
impact on results.) I could use unsigned 8-bit integers (0 to 255),
probably using only 0 to 254 so that 1 and 2 could be represented with
perfect precision as 127/127 and 254/127 (but I would do regression on
the integer values). Or I could use 16 bits, doubling memory load and
improving precision. It would be convenient if R could work with
half-precision floating-point numbers (binary16):

http://en.wikipedia.org/wiki/Half_precision_floating-point_format

Can R do that?

If not, is anyone interested in working on developing some of these
features in R? We have GPL code from PLINK and Octave that might help a
lot.

http://www.gnu.org/software/octave/doc/interpreter/Integer-Data-Types.html

Best,

Mike

--
Michael B. Miller, Ph.D.
Bioinformatics Specialist
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Matthew S Shotwell   Assistant Professor           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to