[R] Is this an artifact of using which?

2008-04-14 Thread Tania Oh
Dear all,

I used which to obtain a subset of values from my data.frame.  
however, I find that there is a trace of the values I  have removed.  
Any suggestions would be greatly appreciate.

Below is my data:

d - data.frame( val   = 1:10,
 group = sample(LETTERS[1:5], 10, repl=TRUE) )

 d
val group
11 B
22 E
33 B
44 C
55 A
66 B
77 A
88 E
99 E
10  10 A

## selecting everything that is not group A
  d-d[which(d$group !=A),]

  d
   val group
1   1 B
2   2 E
3   3 B
4   4 C
6   6 B
8   8 E
9   9 E

  levels(d$group)
[1] A B C E

## why is group A still reflected here?

Many thanks in advance,
tania

D.phil student
Department of Physiology, Anatomy and Genetics
Oxford University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is this an artifact of using which?

2008-04-14 Thread Uwe Ligges


Tania Oh wrote:
 Dear all,
 
 I used which to obtain a subset of values from my data.frame.  
 however, I find that there is a trace of the values I  have removed.  
 Any suggestions would be greatly appreciate.
 
 Below is my data:
 
 d - data.frame( val   = 1:10,
  group = sample(LETTERS[1:5], 10, repl=TRUE) )
 
  d
 val group
 11 B
 22 E
 33 B
 44 C
 55 A
 66 B
 77 A
 88 E
 99 E
 10  10 A
 
 ## selecting everything that is not group A
   d-d[which(d$group !=A),]
 
   d
val group
 1   1 B
 2   2 E
 3   3 B
 4   4 C
 6   6 B
 8   8 E
 9   9 E
 
   levels(d$group)
 [1] A B C E
 
 ## why is group A still reflected here?


Because you have removed elements from a factor objects that has 
particular levels. You remove elements (=observations), but the factor 
still knows that all levels are possible (stired in attributes of the 
object).

If you want to remove all levels without corresponding observations, use 
explicit drop=TRUE as the help page suggests, e.g.:


d - d[d$group != A, ]
d$group - d$group[ , drop = TRUE]

Uwe Ligges



 Many thanks in advance,
 tania
 
 D.phil student
 Department of Physiology, Anatomy and Genetics
 Oxford University
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is this an artifact of using which?

2008-04-14 Thread Tania Oh
Dear Uwe,
thank you very much for this.
After reading your solution below, I searched the help pages for  
data.frame, which, factor but I didn't see the option for drop in  
them.

I googled and found drop associated with the function subset. is  
this the help page you were alluding to?


Sorry if I've missed something.
thanks so much in advance again.
tania

On 14 Apr 2008, at 12:39, Uwe Ligges wrote:



 Tania Oh wrote:
 Dear all,
 I used which to obtain a subset of values from my data.frame.   
 however, I find that there is a trace of the values I  have  
 removed.  Any suggestions would be greatly appreciate.
 Below is my data:
 d - data.frame( val   = 1:10,
 group = sample(LETTERS[1:5], 10, repl=TRUE) )
 d
val group
 11 B
 22 E
 33 B
 44 C
 55 A
 66 B
 77 A
 88 E
 99 E
 10  10 A
 ## selecting everything that is not group A
  d-d[which(d$group !=A),]
  d
   val group
 1   1 B
 2   2 E
 3   3 B
 4   4 C
 6   6 B
 8   8 E
 9   9 E
  levels(d$group)
 [1] A B C E
 ## why is group A still reflected here?


 Because you have removed elements from a factor objects that has  
 particular levels. You remove elements (=observations), but the  
 factor still knows that all levels are possible (stired in  
 attributes of the object).

 If you want to remove all levels without corresponding observations,  
 use explicit drop=TRUE as the help page suggests, e.g.:


 d - d[d$group != A, ]
 d$group - d$group[ , drop = TRUE]

 Uwe Ligges



 Many thanks in advance,
 tania
 D.phil student
 Department of Physiology, Anatomy and Genetics
 Oxford University
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is this an artifact of using which?

2008-04-14 Thread Peter Dalgaard
[EMAIL PROTECTED] wrote:
 I used which to obtain a subset of values from my data.frame. 
 however, I find that there is a trace of the values I  have removed. 
 Any suggestions would be greatly appreciate.

 Below is my data:

 d - data.frame( val   = 1:10,
  group = sample(LETTERS[1:5], 10, repl=TRUE) )

  d
 val group
 11 B
 22 E
 33 B
 44 C
 55 A
 66 B
 77 A
 88 E
 99 E
 10  10 A

 ## selecting everything that is not group A
   d-d[which(d$group !=A),]

   d
val group
 1   1 B
 2   2 E
 3   3 B
 4   4 C
 6   6 B
 8   8 E
 9   9 E

   levels(d$group)
 [1] A B C E
 

 The (imho) unintuitive behaviour is to do with the subsetting function 
 [.factor, not which.  There are a couple of workarounds:
   
In that case, your intuition needs readjustment

There are other systems which (de facto) drop unused levels by default,
and it is a real pain to work around, especially for subgroup analyses.
E.g. there is no way to get PROC FREQ in SAS to include a count of zero,
and barplots of ratings fro 0 to 10 lose columns randomly in SPSS
(this _can_ be worked around, though).

Anyways, it is illogical: There's no reason that a tabulation of gender
distribution for (say) tenured CS professors should suddenly pretend
that the female gender does not exist!

 1. Call factor to recreate the levels, and get rid of A
 factor(d$group)

 2. Redefine [.factor; see dropUnusedLevels in the Hmisc package.

 Regards,
 Richie.

 Mathematical Sciences Unit
 HSL


 
 ATTENTION:

 This message contains privileged and confidential info...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is this an artifact of using which?

2008-04-14 Thread Richard . Cotton
  The (imho) unintuitive behaviour is to do with the subsetting function 

  [.factor, not which.  There are a couple of workarounds:
  
 In that case, your intuition needs readjustment
 
 There are other systems which (de facto) drop unused levels by default,
 and it is a real pain to work around, especially for subgroup analyses.
 E.g. there is no way to get PROC FREQ in SAS to include a count of zero,
 and barplots of ratings fro 0 to 10 lose columns randomly in SPSS
 (this _can_ be worked around, though).
 
 Anyways, it is illogical: There's no reason that a tabulation of gender
 distribution for (say) tenured CS professors should suddenly pretend
 that the female gender does not exist!

I didn't mean to be a troll, and I can certainly see the virtue in 
preserving levels for the cases as you described, but it was something 
that caught me out me when I first learned R.  Having the levels of a 
factor as the values that my categorical data takes, rather than the 
_possible_ values that my categorical data takes was more natural to me. 
The important thing is that it is possible to include or drop the unused 
levels easily as required.

Btw, has the behaviour of the drop argument to '[' changed recently?  I 
seem to remember that drop=TRUE didn't remove unused factor levels in 
older versions, though my memory may be mistaken.

Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is this an artifact of using which?

2008-04-14 Thread Uwe Ligges


Tania Oh wrote:
 Dear Uwe,
 thank you very much for this.
 After reading your solution below, I searched the help pages for 
 data.frame, which, factor but I didn't see the option for drop in them.


In fact, ?factor links to ?[.factor whcih explains it.

Uwe


 I googled and found drop associated with the function subset. is this 
 the help page you were alluding to?
 
 
 Sorry if I've missed something.
 thanks so much in advance again.
 tania
 
 On 14 Apr 2008, at 12:39, Uwe Ligges wrote:
 


 Tania Oh wrote:
 Dear all,
 I used which to obtain a subset of values from my data.frame.  
 however, I find that there is a trace of the values I  have 
 removed.  Any suggestions would be greatly appreciate.
 Below is my data:
 d - data.frame( val   = 1:10,
 group = sample(LETTERS[1:5], 10, repl=TRUE) )
 d
val group
 11 B
 22 E
 33 B
 44 C
 55 A
 66 B
 77 A
 88 E
 99 E
 10  10 A
 ## selecting everything that is not group A
  d-d[which(d$group !=A),]
  d
   val group
 1   1 B
 2   2 E
 3   3 B
 4   4 C
 6   6 B
 8   8 E
 9   9 E
  levels(d$group)
 [1] A B C E
 ## why is group A still reflected here?


 Because you have removed elements from a factor objects that has 
 particular levels. You remove elements (=observations), but the factor 
 still knows that all levels are possible (stired in attributes of the 
 object).

 If you want to remove all levels without corresponding observations, 
 use explicit drop=TRUE as the help page suggests, e.g.:


 d - d[d$group != A, ]
 d$group - d$group[ , drop = TRUE]

 Uwe Ligges



 Many thanks in advance,
 tania
 D.phil student
 Department of Physiology, Anatomy and Genetics
 Oxford University
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.