Re: [Rd] stringsAsFactors

peter dalgaard Mon, 11 Feb 2013 14:47:24 -0800

On Feb 11, 2013, at 18:50 , Duncan Murdoch wrote:

> 
> I do think that it's unfortunate that we don't get the same result in both 
> cases, and I'd like to have gotten the predictions you suggested, but I don't 
> think that's going to happen.  The reason for the difference is that the 
> subsetting is done before the conversion to a factor, but I think that is 
> unavoidable without really big changes.


It's logically impossible I'd say. If you want to do conversion from character 
to factor on an as-needed basis, you _will_ have issues with subsetting 
operations affecting the set of levels. 

The logical way out is to define factors before subsetting. As far as possible, 
create them up front. Doing it automagically in read.table is far from 
infallible, but at least has some chance of getting in roughly right. In my 
view, this is actually a pretty strong argument for keeping 
stringsAsFactors==TRUE. 

(Praeterea censeo: The real issue is that plain-text data file formats contain 
insufficient metadata, so what we probably should do is to start thinking about 
ways to encode type and level set information in the files themselves.) 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [email protected]  Priv: [email protected]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stringsAsFactors

Reply via email to