Dear R users,
I am trying to apply the analysis processed in a paper, on the data I'm working
with.
The data is: 80 patients for which I have survival data (time - days, and event
- binary), and microarray expression data for 200 genes (predictor continuous
variables).
My data matrix "data.test" has ncol: 202 and nrow: 80.
What I want to do is:
- run recursive partitioning on this data to get groups of patients homogenous
in terms of survival/prognosis.
- extract the "correlation" of single gene expression (each of the 200 genes)
with recurrence-free survival (time and event): i want to know which variables
can predict best a poor/good prognosis based on survival data.
I am using function "ctree" from the "party" package.
I came up with this command:
test <- ctree(Surv(time, event)~.,
data =data.test,
controls=ctree_control(teststat="max", testtype="Bonferroni",
mincriterion=0.95,savesplitstats = TRUE),
ytrafo = function(data)trafo(data, numeric_trafo = rank),
xtrafo=function(data)trafo(data, surv_trafo=logrank_trafo(data,
ties.method = "logrank"))
)
which works well but as I am not a statistician it is quite confusing and i
might not run it properly.
My technical problem is that I would like to extract the statistics output from
my "test" object (BinaryTree class), i.e. P-value of each of the 200
comparisons (survival data versus each gene): i would like to know which of
them can be really correlated to each node of the tree.
I tried:
test@tree$criterion$statistic
but the maximum value of this is 16, so I assume it is not a p-value as such:
what is it?
and:
test@tree$criterion$criterion
maximum value is 0.96 and minimum value is 0; only one is > 0.95
str(test) gives quite some information, but it is more confusing than helping
me at the moment.
I want to know:
- if my command for "ctree" makes sense to people who have more experience than
me with this kind of data...
- which elements of "test" represent which statistics and how to interpret
them: as I understood, setting "mincriterion" to 0.95 equals to setting up a
P-value threshold of 0.05 (ctree help: "when 'mincriterion = 0.95', the p-value
must be smaller than $0.05$ in order to split this node.")
I hope my explanation is clear, I might be completely mistaken: any tip or
guidance are more than welcome...
Thanks!
Sarah
sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 grid splines stats graphics grDevices utils
datasets methods
[10] base
other attached packages:
[1] biomaRt_2.10.0 party_1.0-2 vcd_1.2-13 colorspace_1.1-1
MASS_7.3-20
[6] strucchange_1.4-7 sandwich_2.2-9 zoo_1.7-7 coin_1.0-21
mvtnorm_0.9-9992
[11] modeltools_0.2-19 survival_2.36-14
loaded via a namespace (and not attached):
[1] lattice_0.20-6 RCurl_1.91-1.1 tools_2.14.2 XML_3.9-4.1
------------------
Sarah Bonnin
Bioinformatician
Centre for Genomic Regulation
C/ Dr. Aiguader, 88
08003 Barcelona, Spain
------------------
Sarah Bonnin
Bioinformatician
Genomics Unit - Office 439.01
Centre for Genomic Regulation
C/ Dr. Aiguader, 88
08003 Barcelona, Spain
Tel. +34 93-316-0373
www.crg.eu
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.