Re: [R] missing handling
Hi, Jim: I tried your code and get the following error: trn1-read.table('trn1.svm', header=F, na.string='.', sep='|') Med-apply(trn1, 2, median, na.rm=T) Ind-which(is.na(trn1), arr.ind=T) trn1[Ind]-Med[Ind[,'col']] Error in [-.data.frame(`*tmp*`, Ind, value = c(1.00802124455, 1.00802124455, : only logical matrix subscripts are allowed in replacement I cannot figure out why. Thanks for help, On 9/27/05, jim holtman [EMAIL PROTECTED] wrote: Use 'which(...arr.ind=T)' x.1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 6 10 3 4 10 7 9 8 4 10 [2,] 8 7 4 7 4 8 3 NA 3 4 [3,] 7 7 10 10 3 5 3 2 2 2 [4,] 3 4 5 10 10 2 6 9 4 5 [5,] 3 5 9 5 6 NA 3 NA 6 7 [6,] 9 6 10 5 10 4 2 10 NA 5 [7,] 5 2 5 10 3 7 6 4 6 8 [8,] 2 6 1 8 9 2 7 8 3 8 [9,] 9 1 4 9 8 10 2 NA 1 7 [10,] 2 4 8 7 NA 4 3 NA 5 5 x.4 [1] 5.5 5.5 5.0 7.5 8.0 5.0 3.0 8.0 4.0 6.0 Med - apply(x.1, 2, median, na.rm=T) # get median Ind - which(is.na(x.1), arr.ind=T) # determine which are NA x.1[Ind] - Med[Ind[,'col']] # replace with median x.1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 6 10 3 4 10 7 9 8 4 10 [2,] 8 7 4 7 4 8 3 8 3 4 [3,] 7 7 10 10 3 5 3 2 2 2 [4,] 3 4 5 10 10 2 6 9 4 5 [5,] 3 5 9 5 6 5 3 8 6 7 [6,] 9 6 10 5 10 4 2 10 4 5 [7,] 5 2 5 10 3 7 6 4 6 8 [8,] 2 6 1 8 9 2 7 8 3 8 [9,] 9 1 4 9 8 10 2 8 1 7 [10,] 2 4 8 7 8 4 3 8 5 5 On 9/27/05, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, I have the following codes to replace missing using median, assuming missing only occurs on continuous variables: trn1-read.table('trn1.fv', header=F, na.string='.', sep='|') # median m.trn1-sapply(1:ncol(trn1), function(i) median(trn1[,i], na.rm=T)) #replace trn2-trn1 for (each in 1:nrow(trn1)){ index.missing=which(is.na(trn1[each,])) trn2[each,]-replace(trn1[each,], index.missing, m.trn1[index.missing]) } Anyone can suggest some ways to improve it since replacing 10 takes 1.5sec: system.time(for (each in 1:10){index.missing=which(is.na (trn1[each,])); trn2[each,]-replace(trn1[each,], index.missing, m.trn1[index.missing ]);}) [1] 1.53 0.00 1.53 0.00 0.00 Another general question is are there some packages in R doing missing handling? Thanks, -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing handling
On Tue, 4 Oct 2005, Weiwei Shi wrote: Hi, Jim: I tried your code and get the following error: trn1-read.table('trn1.svm', header=F, na.string='.', sep='|') Med-apply(trn1, 2, median, na.rm=T) Ind-which(is.na(trn1), arr.ind=T) trn1[Ind]-Med[Ind[,'col']] Error in [-.data.frame(`*tmp*`, Ind, value = c(1.00802124455, 1.00802124455, : only logical matrix subscripts are allowed in replacement I cannot figure out why. Read the help for [-.data.frame to be told the answer. A data frame (as given by read.table) is not a matrix, as the example presumably was. Indexing whole matrices at once is efficient, but it hides loops for data frames. You will not do better than looping over columns for a data frame, but you certainly do not need to loop over rows which is very inefficient. Something like trn2 - trn1 for(i in names(trn2)) { Med - median(trn2[[i]], na.rm = TRUE) trn2[i, is.na(trn2[[i]])] - Med } Thanks for help, On 9/27/05, jim holtman [EMAIL PROTECTED] wrote: Use 'which(...arr.ind=T)' x.1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 6 10 3 4 10 7 9 8 4 10 [2,] 8 7 4 7 4 8 3 NA 3 4 [3,] 7 7 10 10 3 5 3 2 2 2 [4,] 3 4 5 10 10 2 6 9 4 5 [5,] 3 5 9 5 6 NA 3 NA 6 7 [6,] 9 6 10 5 10 4 2 10 NA 5 [7,] 5 2 5 10 3 7 6 4 6 8 [8,] 2 6 1 8 9 2 7 8 3 8 [9,] 9 1 4 9 8 10 2 NA 1 7 [10,] 2 4 8 7 NA 4 3 NA 5 5 x.4 [1] 5.5 5.5 5.0 7.5 8.0 5.0 3.0 8.0 4.0 6.0 Med - apply(x.1, 2, median, na.rm=T) # get median Ind - which(is.na(x.1), arr.ind=T) # determine which are NA x.1[Ind] - Med[Ind[,'col']] # replace with median x.1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 6 10 3 4 10 7 9 8 4 10 [2,] 8 7 4 7 4 8 3 8 3 4 [3,] 7 7 10 10 3 5 3 2 2 2 [4,] 3 4 5 10 10 2 6 9 4 5 [5,] 3 5 9 5 6 5 3 8 6 7 [6,] 9 6 10 5 10 4 2 10 4 5 [7,] 5 2 5 10 3 7 6 4 6 8 [8,] 2 6 1 8 9 2 7 8 3 8 [9,] 9 1 4 9 8 10 2 8 1 7 [10,] 2 4 8 7 8 4 3 8 5 5 On 9/27/05, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, I have the following codes to replace missing using median, assuming missing only occurs on continuous variables: trn1-read.table('trn1.fv', header=F, na.string='.', sep='|') # median m.trn1-sapply(1:ncol(trn1), function(i) median(trn1[,i], na.rm=T)) #replace trn2-trn1 for (each in 1:nrow(trn1)){ index.missing=which(is.na(trn1[each,])) trn2[each,]-replace(trn1[each,], index.missing, m.trn1[index.missing]) } Anyone can suggest some ways to improve it since replacing 10 takes 1.5sec: system.time(for (each in 1:10){index.missing=which(is.na (trn1[each,])); trn2[each,]-replace(trn1[each,], index.missing, m.trn1[index.missing ]);}) [1] 1.53 0.00 1.53 0.00 0.00 Another general question is are there some packages in R doing missing handling? Thanks, -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing handling
At 8:35 PM +0100 10/4/05, Prof Brian Ripley wrote: On Tue, 4 Oct 2005, Weiwei Shi wrote: Hi, Jim: I tried your code and get the following error: trn1-read.table('trn1.svm', header=F, na.string='.', sep='|') Med-apply(trn1, 2, median, na.rm=T) Ind-which(is.na(trn1), arr.ind=T) trn1[Ind]-Med[Ind[,'col']] Error in [-.data.frame(`*tmp*`, Ind, value = c(1.00802124455, 1.00802124455, : only logical matrix subscripts are allowed in replacement I cannot figure out why. Read the help for [-.data.frame to be told the answer. A data frame (as given by read.table) is not a matrix, as the example presumably was. Indexing whole matrices at once is efficient, but it hides loops for data frames. You will not do better than looping over columns for a data frame, but you certainly do not need to loop over rows which is very inefficient. Something like trn2 - trn1 for(i in names(trn2)) { Med - median(trn2[[i]], na.rm = TRUE) trn2[i, is.na(trn2[[i]])] - Med } But exchange the indices: trn2[ is.na(trn2[[i]]) , i] - Med Thanks for help, On 9/27/05, jim holtman [EMAIL PROTECTED] wrote: Use 'which(...arr.ind=T)' x.1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 6 10 3 4 10 7 9 8 4 10 [2,] 8 7 4 7 4 8 3 NA 3 4 [3,] 7 7 10 10 3 5 3 2 2 2 [4,] 3 4 5 10 10 2 6 9 4 5 [5,] 3 5 9 5 6 NA 3 NA 6 7 [6,] 9 6 10 5 10 4 2 10 NA 5 [7,] 5 2 5 10 3 7 6 4 6 8 [8,] 2 6 1 8 9 2 7 8 3 8 [9,] 9 1 4 9 8 10 2 NA 1 7 [10,] 2 4 8 7 NA 4 3 NA 5 5 x.4 [1] 5.5 5.5 5.0 7.5 8.0 5.0 3.0 8.0 4.0 6.0 Med - apply(x.1, 2, median, na.rm=T) # get median Ind - which(is.na(x.1), arr.ind=T) # determine which are NA x.1[Ind] - Med[Ind[,'col']] # replace with median x.1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 6 10 3 4 10 7 9 8 4 10 [2,] 8 7 4 7 4 8 3 8 3 4 [3,] 7 7 10 10 3 5 3 2 2 2 [4,] 3 4 5 10 10 2 6 9 4 5 [5,] 3 5 9 5 6 5 3 8 6 7 [6,] 9 6 10 5 10 4 2 10 4 5 [7,] 5 2 5 10 3 7 6 4 6 8 [8,] 2 6 1 8 9 2 7 8 3 8 [9,] 9 1 4 9 8 10 2 8 1 7 [10,] 2 4 8 7 8 4 3 8 5 5 On 9/27/05, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, I have the following codes to replace missing using median, assuming missing only occurs on continuous variables: trn1-read.table('trn1.fv', header=F, na.string='.', sep='|') # median m.trn1-sapply(1:ncol(trn1), function(i) median(trn1[,i], na.rm=T)) #replace trn2-trn1 for (each in 1:nrow(trn1)){ index.missing=which(is.na(trn1[each,])) trn2[each,]-replace(trn1[each,], index.missing, m.trn1[index.missing]) } Anyone can suggest some ways to improve it since replacing 10 takes 1.5sec: system.time(for (each in 1:10){index.missing=which(is.na (trn1[each,])); trn2[each,]-replace(trn1[each,], index.missing, m.trn1[index.missing ]);}) [1] 1.53 0.00 1.53 0.00 0.00 Another general question is are there some packages in R doing missing handling? Thanks, -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- -- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing handling
Use 'which(...arr.ind=T)' x.1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 6 10 3 4 10 7 9 8 4 10 [2,] 8 7 4 7 4 8 3 NA 3 4 [3,] 7 7 10 10 3 5 3 2 2 2 [4,] 3 4 5 10 10 2 6 9 4 5 [5,] 3 5 9 5 6 NA 3 NA 6 7 [6,] 9 6 10 5 10 4 2 10 NA 5 [7,] 5 2 5 10 3 7 6 4 6 8 [8,] 2 6 1 8 9 2 7 8 3 8 [9,] 9 1 4 9 8 10 2 NA 1 7 [10,] 2 4 8 7 NA 4 3 NA 5 5 x.4 [1] 5.5 5.5 5.0 7.5 8.0 5.0 3.0 8.0 4.0 6.0 Med - apply(x.1, 2, median, na.rm=T) # get median Ind - which(is.na(x.1), arr.ind=T) # determine which are NA x.1[Ind] - Med[Ind[,'col']] # replace with median x.1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 6 10 3 4 10 7 9 8 4 10 [2,] 8 7 4 7 4 8 3 8 3 4 [3,] 7 7 10 10 3 5 3 2 2 2 [4,] 3 4 5 10 10 2 6 9 4 5 [5,] 3 5 9 5 6 5 3 8 6 7 [6,] 9 6 10 5 10 4 2 10 4 5 [7,] 5 2 5 10 3 7 6 4 6 8 [8,] 2 6 1 8 9 2 7 8 3 8 [9,] 9 1 4 9 8 10 2 8 1 7 [10,] 2 4 8 7 8 4 3 8 5 5 On 9/27/05, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, I have the following codes to replace missing using median, assuming missing only occurs on continuous variables: trn1-read.table('trn1.fv', header=F, na.string='.', sep='|') # median m.trn1-sapply(1:ncol(trn1), function(i) median(trn1[,i], na.rm=T)) #replace trn2-trn1 for (each in 1:nrow(trn1)){ index.missing=which(is.na(trn1[each,])) trn2[each,]-replace(trn1[each,], index.missing, m.trn1[index.missing]) } Anyone can suggest some ways to improve it since replacing 10 takes 1.5sec: system.time(for (each in 1:10){index.missing=which(is.na(trn1[each,])); trn2[each,]-replace(trn1[each,], index.missing, m.trn1[index.missing]);}) [1] 1.53 0.00 1.53 0.00 0.00 Another general question is are there some packages in R doing missing handling? Thanks, -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html