checking the code that read the frame, I see that the problem was indeed caused by missing value codes at the read.table() stage. However I did not want to re-visit the reading stages again with these frames. (To show why not I include the code that read them, which you may recognise from an earlier thread in which I got some help from Andy Liaw.)
Murray
nam.vec <-c(“min.pkt.sz”,”pkt.count”,”bytes”,”duration”,”m1.psz”,”m1.count”,”m2.psz”,”m2.count”,”m3.psz”,”m3.count”,”iat.min”,”iat.max”,”m1.iat”,”m1.iat.count”,”m2.iat”,”m2.iat.count”,”m3.iat”,”m3.iat.count”,”port”,”ip.address”,“min.pkt.sz2”,”pkt.count2”,”bytes2”,
”m1.psz2”,”m1.count2”,”m2.psz2”,”m2.count2”,”m3.psz2”,”m3.count2”,”iat.min2”,”iat.max2”,”m1.iat2”,”m1.iat.count2”,”m2.iat2”,”m2.iat.count2”,”m3.iat2”,”m3.iat.count2”,”port2”,”ip.address2”,”diff.min.psz”,”diff.max.psz”)
flines <- 107165 slines <- 3000 sel6 <- sample(flines,3000*6) selected1 <- sort(sel6[1:3000]) selected2 <- sort(sel6[3001:6000]) selected3 <- sort(sel6[6001:9000]) selected4 <- sort(sel6[9001:12000]) selected5 <- sort(sel6[12001:15000]) selected6 <- sort(sel6[15001:18000])
select.frame <- function(selected) {
strvec <- rep("",slines)
selected <- sort(sample(flines, slines))
skip <- c(0, diff(selected) - 1)
fcon <- file("c:/data/perry/data.csv", open="r")
for (i in 1:length(skip)) {
## skip to the selected line
readLines(fcon, n=skip[i])
strvec[i] <- readLines(fcon, n=1)
}
close(fcon)
sel.flows <- read.table(textConnection(strvec), header=FALSE, sep=",")
names(sel.flows) <- nam.vec
sel.flows
}Thomas W Blackwell wrote:
Michael -
Because these columns are factors to begin with, using as.numeric() alone will have unexpected results. See the section "Warning:" in help("factor").
However, it is worth Murray asking himself WHY these columns are factors to start with, rather than the expected numeric values. One frequent source of this is using read.table() on a file which contains column headers without setting header=T. Then, the character string in the first row of each column prevents numeric conversion of all of the other rows. Another possible difficulty is an unusual missing value code, or commas in place of decimal points, or anything else, somewhere in the file that does not convert automatically to numeric. Maybe it's worth editing the original data file before Murray reads it in.
Hmmm. I think I ought to have offered these many cents worth with my earlier reply.
- tom blackwell - u michigan medical school - ann arbor -
On Mon, 8 Sep 2003, Michael A. Miller wrote:
"Murray" == Murray Jorgensen <[EMAIL PROTECTED]> writes:
> I have just noticed that quite a few columns of a data > frame that I am working on are numeric factors. For > summary() purposes I want them to be vectors.
Do you want them to be vectors or do you want numeric values? If the later, try as.numeric instead of as.vector:
as.vector(factor(rep(seq(4),3)))
[1] "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4"
as.numeric(factor(rep(seq(4),3)))
[1] 1 2 3 4 1 2 3 4 1 2 3 4
summary(as.vector(factor(rep(seq(4),3))))
Length Class Mode 12 character character
summary(as.numeric(factor(rep(seq(4),3))))
Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 1.75 2.50 2.50 3.25 4.00
Mike
-- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED] Fax 7 838 4155 Phone +64 7 838 4773 wk +64 7 849 6486 home Mobile 021 1395 862
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
