Hello,
 
I have a large data set 320.000 rows and 1000 columns. All the data has the 
values 0,1,2.
I wrote a script to remove all the rows with more than 46 missing values. This 
works perfect on a smaller dataset. But the problem arises when I try to run it 
on the larger data set I get an error “cannot allocate vector size 1240 kb”. 
I’ve searched through previous posts and found out that it might be because i’m 
running it on a linux cluster with R version R 2.1.0.  which operates on a 32 
bit processor. But I could not find a solution for this problem. The cluster is 
a really fast one and should be able to cope with these large amounts of data 
the systems configuration are Speed: 3.4 GHz, memory 4GByte. Is there a way to 
change the settings or processor under R? I want to run the function Random 
Forest on my large data set it should be able to cope with that amount. Perhaps 
someone has tried this before in R or is Fortram a better choice? I added my R 
script down below.
 
Best regards,
 
Iris Kolder
 
SNP <- read.table("file.txt", header=FALSE, sep="")    # read in data file
SNP[SNP==9]<-NA                                   # change missing values from 
a 9 to a NA
SNP$total.NAs = rowSums(is.na(SN         # calculate the number of NA per row 
and adds a colum with total Na's
SNP = SNP[ SNP$total.NAs < 46,  ]         # create a subset with no more than 
5%(46) NA's 
SNP$total.NAs=NULL                              # remove added column with sum 
of NA's
SNP  = t(as.matrix(SNP))                          # transpose rows and columns
set.seed(1)                                                                     
              
snp.na<-SNP 
snp.roughfix<-na.roughfix(snp.na)                                             
fSNP<-factor(snp.roughfix[, 1])                # Asigns factor to case control 
status
 
snp.narf<- randomForest(snp.roughfix[,-1], fSNP, na.action=na.roughfix, 
ntree=500, mtry=10, importance=TRUE, keep.forest=FALSE, do.trace=100)
 
print(snp.narf)

__________________________________________________



        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to