On Tue, May 24, 2016 at 9:30 AM, Jeroen Ooms <jeroen.o...@stat.ucla.edu> wrote:
> On Tue, May 24, 2016 at 5:59 PM, Gabriel Becker <gmbec...@ucdavis.edu> > wrote: > > Shouldn't Rf_mkString(NULL) return (the c-level equivalent of) > character() > > rather than the NA_character_? > > No. It should still be safe to assume that mkString() always returns a > character vector of exactly length one. Anything else could lead to > type errors. > Well the thing is you're passing an invalid pointer, that doesn't point to a C string, to a constructor expecting a valid const char *. I'm fine with the contract being that mkString always returns a character vector of length one, but that doesn't necessarily mean that the function needs to accept NULL pointers. The contract as I understand it is that if you give it a C string, it will create a CHARSXP for that string. In this light, Bill's suggestion that it throw an error seems the most principled response. I would think you would need to at the very least emit a warning. > > > An empty string and NULL aren't the same. > > Exactly! So if you pass in an empty C string, you get an empty R > string, and if you pass in a null pointer you get NA. > > Rf_mkString(NULL) <--> NA > Rf_mkString("") <--> "" > > There is no ambiguity, and much better than segfaulting. > Well, better than segfaulting is not really relevant here. No one is arguing that it should segfault. The question is what behavior it should have when it doesn't segfault. It's true that a C empty string is not the same as NULL, but NULL isn't the same as NA either. Semantically, for your use-case (which I gather arose from interactions we had :) ) the NULL means there is no version, while NA indicates there is a version but we don't know what it is. Imagine an object class that represents a persons name (first, middle, last). Now take two people, One has no middle name (and we know that when creating the object) and another for whom we don't have any information about the middle name, only first and last were reported. I would expect the first one to have middle name either NULL or (in a data.frame context) "", while the second would have NA_character_. In this light, mkString should arguably generate "". i don't think the fact that there is another way to get "" is a particularly large problem. On the other hand, and in support of your position it came up as Michael Lawrence and I were talking about this that asChar from utils.c will give you NA_STRING when you give it R_NilValue. That is a coercion though, whereas arguably mkString is not. That said, consistency would probably be good. ~G -- Gabriel Becker, PhD Associate Scientist (Bioinformatics) Genentech Research [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel