Duncan Murdoch wrote: > On 07/08/2007 5:06 PM, Herve Pages wrote: >> Hi, >> >> ?rawToChar >> 'rawToChar' converts raw bytes either to a single character string >> or a character vector of single bytes. (Note that a single >> character string could contain embedded nuls.) >> >> Allowing embedded nuls in a string might be an interesting experiment >> but it >> seems to cause some troubles to most of the string manipulation >> functions. >> >> A string with an embedded 0: >> >> raw0 <- as.raw(c(65:68, 0 , 70)) >> string0 <- rawToChar(raw0) >> >>> string0 >> [1] "ABCD\0F" >> >> nchar() should return 6: >>> nchar(string0) >> [1] 4 > > You don't state your R version. The default type of counting in nchar() > has recently changed from "bytes" (where 6 is correct) to "chars" (where > 4 is correct).
Oops, sorry: > sessionInfo() R version 2.6.0 Under development (unstable) (2007-07-02 r42107) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] rcompgen_0.1-15 And indeed: raw0 <- as.raw(c(65:68, 0 , 70)) string0 <- rawToChar(raw0) > nchar(string0, type="chars") [1] 4 > nchar(string0, type="bytes") [1] 6 In addition to the string functions already mentioned before, it's worth noting that 'paste' doesn't seem to be "embedded nul aware" neither: > paste(string0, "G", sep="") [1] "ABCDG" Same for serialization: > save(string0, file="string0.rda") > load("string0.rda") > string0 [1] "ABCD" One comment about the nchar man page: 'chars' The number of human-readable characters. "human-readable" seems to be used for "everything but a nul" here which can be confusing. For example one would generally think of ascii codes 1 to 31 as non "human-readable" but nchar() seems to disagree: > string1 <- rawToChar(as.raw(1:31)) > string1 [1] "\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037" > nchar(string1, type="chars") [1] 31 Cheers, H. > > Duncan Murdoch > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel