Hello, I have a large database with a column containing a factor: ``` > str(df) 'data.frame': 5000000 obs. of 4 variables: $ MR : num 0.000809 0.001236 0.001663 0.002089 0.002516 ... $ FCN : num 2 2 2 2 2 2 2 2 2 2 ... $ Class: Factor w/ 3 levels "negative","positive",..: 1 1 1 1 1 1 1 1 1 1 ... $ Set : int 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "out.attrs")=List of 2 ..$ dim : Named int [1:2] 1000 1000 .. ..- attr(*, "names")= chr [1:2] "X1" "X2" ..$ dimnames:List of 2 .. ..$ X1: chr [1:1000] "X1=0.0008094667" "X1=0.0012360955" "X1=0.0016627243" "X1=0.0020893531" ... .. ..$ X2: chr [1:1000] "X2= 2.000000" "X2= 2.048048" "X2= 2.096096" "X2= 2.144144" ... ``` I would like to run prop.test on df$Class, but: ``` > prop.test(x=point$Class, n=length(point$Class), + conf.level=.95, correct=FALSE) Error in prop.test(x = point$Class, n = length(point$Class), conf.level = 0.95, : 'x' and 'n' must have the same length ``` Since `x` is "a vector of counts of successes, a one-dimensional table with two entries, or a two-dimensional table (or matrix) with 2 columns, giving the counts of successes and failures, respectively." I provided point$Class. The total number of tests is length(point$Class). There are three levels: ``` > unique(df$Class) [1] negative positive uncertain Levels: negative positive uncertain ``` I tried to remove the levels to check if the levels were interfering with the test: ``` > df$Class = levels(droplevels(df$Class)) Error in `$<-.data.frame`(`*tmp*`, Class, value = c("negative", "positive", : replacement has 3 rows, data has 5000000 ``` What would be the syntax for this test? The idea is to get the most common value for each unique(df$Set) and prop.test will provide also the 95% CI for the estimate. Thanks
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.