On Wed, Feb 15, 2012 at 7:05 PM, Jeroen Ooms <jeroen.o...@stat.ucla.edu> wrote:
> The second problem is that the spss dataformat allows to specify > 'duplicate labels', whereas this is not allowed for factors. read.spss > does not deal with this and creates a bad factor > > x <- read.spss("http://www.stat.ucla.edu/~jeroen/spss/duplicate_labels.sav", > use.value.labels=T); > levels(x$opinion); > > which causes issues downstream. I am not sure if this is an issue in > read.spss() or as.factor(), but I guess it might be wise to try to > detect duplicate levels and assign them all with one and the same > integer value when converting to a factor. I think this one would be better dealt with by giving an error. SPSS value labels are just labels, so they don't map very well onto R factors, which are enumerated types. Rather than force them and lose data, I would prefer to make the user decide what to do. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel