>>>>> "Spencer" == Spencer Graves <[EMAIL PROTECTED]> >>>>> on Thu, 14 Oct 2004 13:41:24 -0700 writes:
Spencer> It looks like you have several non-printing Spencer> characters. "nchar" will give you the total number Spencer> of characters in each character string. Spencer> "strsplit" can break character strings into single Spencer> characters, and "%in%" can be used to classify Spencer> them. and you give nice coding examples: Spencer> Consider the following: >> x <- "Draszt 0%/1€Œiso8859-15³" >> nx <- nchar(x) >> x. <- strsplit(x, "") >> length(x.[[1]]) Spencer> [1] 29 >> >> namechars <- c(letters, LETTERS, as.character(0:9), ".") just to be precise: If 'namechars' is supposed to mean ``characters valid in R object names'', then you should have added "_" as well: namechars <- c(letters, LETTERS, as.character(0:9), ".", "_") >> punctuation <- c(",", "!", "+", "*", "&", "|") >> legalchars <- c(namechars, punctuation) and 'legalchars' would have to contain quite a bit more I presume, e.g. "$", "@", .... (but that wouldn't have been a reason to write this e-mail..) >> legalx <- lapply(x., function(y)(y %in% legalchars)) >> x.[[1]][!legalx[[1]]] Spencer> [1] " " "" "%" "/" "Â" "€" "Â" "Œ" "-" "" "Â" "³" >> >> sapply(legalx, sum) Spencer> [1] 17 Spencer> Will this give you ideas about what to do what you want? Spencer> hope this helps. spencer graves (and this too) Martin Maechler, ETH Zurich Spencer> Gabor Grothendieck wrote: >> Assuming that the problem is that your input file has >> additional embedded characters added by the data base >> program you could try extracting just the text using >> the UNIX strings program: >> >> strings myfile.csv > myfile.txt >> >> and see if myfile.txt works with R and if not check out >> what the differences are between it and the .csv file. >> >> Date: Thu, 14 Oct 2004 11:31:33 -0700 >> From: Scott Waichler <[EMAIL PROTECTED]> >> To: <[EMAIL PROTECTED]> >> Subject: [R] Problem with number characters >> >> >> I am trying to process text fields scanned in from a csv file that is >> output from the Windows database program FileMakerPro. The characters >> onscreen look like regular text, but R does not like their underlying binary form. >> For example, one of text fields contains a name and a number, but >> R recognizes the number as something other than what it appears >> to be in plain text. The character string "Draszt 03" after being >> read into R using scan and ="" becomes "Draszt 03" where the 3 is >> displayed in my R session as a superscript. Here is the result pasted >> into this email I'm composing in emacs: "Draszt 0%/1€Œiso8859-15³" >> Another clue for the knowledgable: when I try to display the vector element >> causing trouble, I get >> <CHARSXP: "Draszt 0%/1€Œiso8859-15³"> >> where again the superscipt part is just "3" in my R session. I'm working in >> Linux, R version 1.9.1, 2004-06-21. Your help will be much appreciated. >> >> Scott Waichler >> Pacific Northwest National Laboratory >> [EMAIL PROTECTED] ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html