[R] new user question on dataframe comparisons and plots

2007-08-01 Thread Conor Robinson
I'm coming from the scipy community and have been using R on and for
the past week or so.  I'm still feeling out the language structure,
but so far so good.  I apologize in advance if I pose any obvious
questions, due to my current lack of diction when searching for my
issue, or recognizing it if I did see it.

Question 1, plots:

I have a data frame with 4 type factor columns, also in the data frame
I have one single, type logical column with the response data (T or
F).  I would like to plot a 4*4 grid showing all the two way attribute
interactions like with plot(data.frame) or pairs(data.frame,
panel=panel.smooth), however show the response's True and False as
different colors, or any other built in graphical analysis that might
be relevant in this case.  I'm sure this is simple since this is a
common procedure, thanks in advance for humoring me.  Also, what is
the correct term for this type of plot?


Question 2, data frame analysis:

I have two sub data frames split by whether my logical column is T or
F.  I want to compare the same factor column between both of the two
sub data frames (there are a few hundred different unique possibles
for this factor column eg  -  enumerated).  I've used table()
on the attribute columns from each sub frame to get counts.

pos - data.frame(table(df.true$CAT))

  10
BASD  0
ZAQM 4
...

neg - data.frame(table(df.false$CAT))

 1000
BASD  3
ZAQM  9
PPWS 10
...

The TRUE sub frame has less unique factors that the sub frame FALSE, I
would like an output data frame that is one column all the factors
from the TRUE sub frame and the second column the counts from the TRUE
attributes / counts from the corresponding FALSE attributes ie
%response for each represented factor.  It's fine (better even) if all
factors are included and there is just a zero for the attributes with
no TRUEs.

I've been going off making my own function and running into trouble
with the data frame not being a vector etc etc, but I have a feeling
there is a *much* better way ie built in function, but I've hit my
current level of R understanding.

Thank you,
Conor

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] new user question on dataframe comparisons and plots

2007-08-01 Thread Stephen Tucker
Hi Conor,

I hope I interpreted your question correctly. I think for the first one you
are looking for a conditioning plot? I am going to create and use some
nonsensical data - 'iris' comes with R so this should be reproducible on your
machine:

library(lattice)
data(iris)
x - iris
# make some factors using cut()
x[,2:3] - lapply(x[,2:3],cut,3)
# add column of TRUE FALSE
x - cbind(x,TF=sample(c(TRUE,FALSE),nrow(x),replace=TRUE))
xyplot(petal.wid~petal.len | ## these are numeric
   sepal.wid*sepal.len,  ## these are factors
   groups=TF,## TRUE or FALSE
   panel=function(x,y,...) {
 panel.xyplot(x,y,...)
 panel.loess(x,y,...)
   },
   data=x,auto.key=TRUE)


merge() should work when you have different factors, when you specify
all=TRUE.

## get counts for TRUE and FALSE
 y - tapply(x$species,INDEX=x$TF,
+function(x) as.data.frame(table(x)))
## merge results
 (z - `names-`(merge(y$`TRUE`,y$`FALSE`,by=x,all=TRUE),
+   c(factor,true,false)))
  factor true false
1 versicolor   2921
2  virginica   2327

## reshape the data frame
 library(reshape)
 melt(z,id=1)
  factor variable value
1 versicolor true29
2  virginica true23
3 versicolorfalse21
4  virginicafalse27

Hope this helps. If it doesn't you can post a small (reproducible) piece of
data and we can maybe help you out a little better...

Best regards,

ST


--- Conor Robinson [EMAIL PROTECTED] wrote:

 I'm coming from the scipy community and have been using R on and for
 the past week or so.  I'm still feeling out the language structure,
 but so far so good.  I apologize in advance if I pose any obvious
 questions, due to my current lack of diction when searching for my
 issue, or recognizing it if I did see it.
 
 Question 1, plots:
 
 I have a data frame with 4 type factor columns, also in the data frame
 I have one single, type logical column with the response data (T or
 F).  I would like to plot a 4*4 grid showing all the two way attribute
 interactions like with plot(data.frame) or pairs(data.frame,
 panel=panel.smooth), however show the response's True and False as
 different colors, or any other built in graphical analysis that might
 be relevant in this case.  I'm sure this is simple since this is a
 common procedure, thanks in advance for humoring me.  Also, what is
 the correct term for this type of plot?
 
 
 Question 2, data frame analysis:
 
 I have two sub data frames split by whether my logical column is T or
 F.  I want to compare the same factor column between both of the two
 sub data frames (there are a few hundred different unique possibles
 for this factor column eg  -  enumerated).  I've used table()
 on the attribute columns from each sub frame to get counts.
 
 pos - data.frame(table(df.true$CAT))
 
   10
 BASD  0
 ZAQM 4
 ...
 
 neg - data.frame(table(df.false$CAT))
 
  1000
 BASD  3
 ZAQM  9
 PPWS 10
 ...
 
 The TRUE sub frame has less unique factors that the sub frame FALSE, I
 would like an output data frame that is one column all the factors
 from the TRUE sub frame and the second column the counts from the TRUE
 attributes / counts from the corresponding FALSE attributes ie
 %response for each represented factor.  It's fine (better even) if all
 factors are included and there is just a zero for the attributes with
 no TRUEs.
 
 I've been going off making my own function and running into trouble
 with the data frame not being a vector etc etc, but I have a feeling
 there is a *much* better way ie built in function, but I've hit my
 current level of R understanding.
 
 Thank you,
 Conor
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.