Hello all,

I posted a question to this list last week and received no response. I am 
unsure if this means no-one knows the answer or if I posed the question badly. 
I'm going to assume I posed the question badly and try again. I am new to R so 
it is quite likely it's a very naive question, however if there is something 
blindingly obvious that I am missing or if there is another resource I should 
consult that I haven't seen would someone be kind enough to point it out 
because it isn't obvious to me. Although my data is from biological experiments 
I think my problem is with R rather than the nature of the data, but I may be 
wrong.

I am attempting to use the pvclust package to do some hierarchical clustering 
on some CGH data I have downloaded from the Progenetix database 
(http://www.progenetix.de/~pgscripts/progenetix/Aboutprogenetix.html). The data 
is in tab delimited format, each column is a single sample each row is a 
chromosome band some example dummy data is shown below.

band     sample1  sample2  sample3 sample4
1p36_33     1       0        0       1
1p36_32     -1      0        -1       0
1p36_31     0       1        1       1
1p36_22    0        -1       -1      -1
etc.... where 0 = no change, 1 = gain, -1 = loss

I have read this file into R using:
> ProgenetixCRC.all.noXY <- 
> read.table("/home/marraydb/Progenetix/Data/CRCall_noXY.txt", header=TRUE, 
> sep="\t", row.names="band")

based on the pvclust documentation I came up with this:
>ProgenetixCRC.all.pvclust <- pvclust(ProgenetixCRC.all, method.dist="cor", 
>method.hclust="average",use.cor="pairwise.complete.obs",nboot=1000)

this results in an error
Error in hclust(distance, method = method.hclust) :
        NA/NaN/Inf in foreign function call (arg 11)
Digging through the mailing list archives I've discovered this means that my 
dataset has missing values. This is very confusing because I have checked and 
there are no missing values. Running is.na() over the data matrix results in 
all false values which I take to mean none of the values are NA. I tried 
various options for the use.cor argument all with the same result. 

Since I originally posted this question I tried changing method.dist to 
euclidean, in this form the function executes without any errors. This is not 
to say the results actually mean anything of course. I am at a loss as to how 
to proceed any input from someone more experienced would be gratefully 
appreciated. If there is some reason why I should not be doing this analysis 
this way in the first place then I'd appreciate having that pointed out also. 
I've tried not to put excess information in here but if more is needed then let 
me know what and I'll post it.

I suspect the problem is me, however if it really is the case that no-one knows 
how to answer this then could anyone suggest another mailing list where I might 
get a better response. Would bioconductor be a better option for example?

Apologies for any offence caused by posting the same question but it's 
difficult for me to proceed until I get some kind of response, even if it is 
that this list is not the right place for this question.

Thanks for your patience,
Richard

Dr Richard Birnie
Scientific Officer
Section of Pathology and Tumour Biology
Welcome Brenner Building, LIMM
St James University Hospital
Beckett St, Leeds, LS9 7TF
Tel:0113 3438624
e-mail: [EMAIL PROTECTED]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to