"Stefan Th. Gries" <[EMAIL PROTECTED]> writes: > Dear all > > I have a problem with splitting up a data frame called ReVerb: > > » str(ReVerb) > `data.frame': 92713 obs. of 16 variables: > $ CHILD : Factor w/ 7 levels "ABE","ADA","EVE",..: 1 1 1 1 1 1 1 1 1 1 ... > $ AGE : Factor w/ 484 levels "1;06.00","1;06.16",..: 43 43 43 99 99 99 > 99 99 99 99 ... > $ AGE_Q : num 2.0 2.0 2.0 2.4 2.4 ... > $ INTERVALS: num 2 2 2 2.25 2.25 2.25 2.25 2.25 2.25 2.25 ... > $ RND : int 34368 38311 14949 20586 72516 27186 88019 10767 114448 > 86146 ... > $ SYNTAX : Factor w/ 17 levels "Acmp","Amats",..: 15 12 8 15 7 16 7 7 16 7 > ... > $ LEXICAL : Factor w/ 1643 levels "$ACHE","$ACT",..: 194 803 803 294 299 > 803 1562 299 679 1562 ... > $ MORPH : Factor w/ 337 levels "$","$ =inf","$ =prs",..: 9 20 9 39 184 > 231 57 67 231 39 ... > $ COMPLEM : Factor w/ 1989 levels "$","$ V PR=Lp [1.2]",..: 203 547 220 203 > 1101 368 1834 1667 368 1834 ... > $ MATRIX : Factor w/ 906 levels "$ ???","$ be PR=Aen",..: 5 5 5 308 5 856 > 5 5 856 308 ... > $ SITUATION: Factor w/ 9 levels "[imitation of Mom: you know what I > said]",..: 2 2 2 2 2 2 2 2 2 2 ... > $ V_ANN : int 1 1 1 4 4 4 4 3 3 3 ... > $ QUEST : int 0 0 0 0 0 0 0 0 0 0 ... > $ EXCL : int 0 0 0 1 1 1 1 0 0 0 ... > $ U_LEN : int 3 4 5 13 13 13 13 8 8 8 ... > $ UTTERANCE: Factor w/ 55113 levels "","# (be)cause he wanted to .",..: 5696 > 39091 52180 2262 2262 2262 2262 3593 3593 3593 ... > > The level causing the problem is SYNTAX: > > » as.data.frame(sort(table(SYNTAX))) > sort(table(SYNTAX)) > Particles 100 > PR=N1 144 > Amats 271 > Trans_PR=A2 787 > Ditrans 1181 > Intrans_PR=A1 1399 > Acmp 2402 > Trans_PR=V2 2433 > CPcmps 2769 > Vpreps 4896 > Intrans_V0 5182 > Trans_PR=L2 7653 > Trans_V02 8117 > Intrans_PR=L1 8457 > Intrans_V1 9643 > Intrans_PR=V1 14987 > Trans_V12 22288 > > > I would like to extract all cases where SYNTAX=="Ditrans" from ReVerb, store > that in a file, and then generate ReVerb again without these cases and factor > levels. My problem is probably obvious from the following lines of code: > > » ditrans<-which(SYNTAX=="Ditrans") > » ReVerb1<-ReVerb[-c(ditrans),]; dim(ReVerb1) > [1] 91532 16 > » > » # ok, so the 92713-91532=1181 cases where SYNTAX=="Ditrans" have been > removed, but ... > » > » ReVerb1<-subset(ReVerb, SYNTAX!="Ditrans"); dim(ReVerb1) > [1] 91528 16 > » > » # ... so why don't I get 91532 again as the number of rows? > » > Any ideas??
The SYNTAX variable is not necessarily the same. Could you retry the first case with ditrans <- which(ReVerb$SYNTAX=="Ditrans") ? Otherwise, try doing a setdiff() on the rownames of the two discrepant results and see which are the four cases that differ. -- O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html