Re: [R] Question about subsetting S4 object in ROCR

2013-05-28 Thread Guido Leoni
Ye sorry
of course i'm interested to the area ranging from 0;0 to 0.4;0.8
thank you
Guido


2013/5/27 Uwe Ligges lig...@statistik.tu-dortmund.de



 On 27.05.2013 16:18, Guido Leoni wrote:

 Dear list
 I'm testing a predictor and I produced nice performance plots with ROCR
 package utilizing the 3 standard command

 pred - prediction(predictions, labels)
 perf - performance(pred, measure = tpr, x.measure = fpr)
 plot(perf, col=rainbow(10))

 The pred object and the perfo object are S4
 with the following slots

 An object of class performance
 Slot x.name:
 [1] False positive rate

 Slot y.name:
 [1] True positive rate

 Slot alpha.name:
 [1] Cutoff

 Slot x.values:
 [[1]]
   [1] 0.00 0.00 0.05 0.10 0.10 0.10 0.10 0.10 0.15 0.15 0.15 0.20 0.25
 0.25
 0.25 0.25 0.25 0.30 0.35 0.35 0.35 0.40 0.40 0.45 0.50 0.50 0.55 0.55 0.60
 [30] 0.65 0.65 0.70 0.70 0.75 0.80 0.85 0.90 0.90 0.95 1.00 1.00


 Slot y.values:
 [[1]]
   [1] 0.00 0.05 0.05 0.05 0.10 0.15 0.20 0.25 0.25 0.30 0.35 0.35 0.35
 0.40
 0.45 0.50 0.55 0.55 0.55 0.60 0.65 0.65 0.70 0.70 0.70 0.75 0.75 0.80 0.80
 [30] 0.80 0.85 0.85 0.90 0.90 0.90 0.90 0.90 0.95 0.95 0.95 1.00


 Slot alpha.values:
 [[1]]
   [1]   Inf 33309 32968 31688 31648 31355 31122 31047 30777 30589 30460
 30395 30305 30159 29841 29101 28734 28657 28393 28196 27740 27662 27373
 27078
 [25] 26763 26303 25573 25416 25364 25357 24993 23834 23789 23616 22357
 20669 20092 18720 18136 17323 16665


 Now i'd like to make a plot (and also compute the AUC) only of the area
 corresponding to  0.80  y.values and 0.40  x.values.
 According to your experience is it possible to subset the perf object to
 the afore mentioned values?


 But x=0.4 and y=0.8 is just a point, so I don't get which plot and area
 you are talking about now?

 Best,
 UWe Ligges






Thanks
 Guido

 [[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about subsetting S4 object in ROCR

2013-05-27 Thread Guido Leoni
Dear list
I'm testing a predictor and I produced nice performance plots with ROCR
package utilizing the 3 standard command

pred - prediction(predictions, labels)
perf - performance(pred, measure = tpr, x.measure = fpr)
plot(perf, col=rainbow(10))

The pred object and the perfo object are S4
with the following slots

An object of class performance
Slot x.name:
[1] False positive rate

Slot y.name:
[1] True positive rate

Slot alpha.name:
[1] Cutoff

Slot x.values:
[[1]]
 [1] 0.00 0.00 0.05 0.10 0.10 0.10 0.10 0.10 0.15 0.15 0.15 0.20 0.25 0.25
0.25 0.25 0.25 0.30 0.35 0.35 0.35 0.40 0.40 0.45 0.50 0.50 0.55 0.55 0.60
[30] 0.65 0.65 0.70 0.70 0.75 0.80 0.85 0.90 0.90 0.95 1.00 1.00


Slot y.values:
[[1]]
 [1] 0.00 0.05 0.05 0.05 0.10 0.15 0.20 0.25 0.25 0.30 0.35 0.35 0.35 0.40
0.45 0.50 0.55 0.55 0.55 0.60 0.65 0.65 0.70 0.70 0.70 0.75 0.75 0.80 0.80
[30] 0.80 0.85 0.85 0.90 0.90 0.90 0.90 0.90 0.95 0.95 0.95 1.00


Slot alpha.values:
[[1]]
 [1]   Inf 33309 32968 31688 31648 31355 31122 31047 30777 30589 30460
30395 30305 30159 29841 29101 28734 28657 28393 28196 27740 27662 27373
27078
[25] 26763 26303 25573 25416 25364 25357 24993 23834 23789 23616 22357
20669 20092 18720 18136 17323 16665


Now i'd like to make a plot (and also compute the AUC) only of the area
corresponding to  0.80  y.values and 0.40  x.values.
According to your experience is it possible to subset the perf object to
the afore mentioned values?
 Thanks
Guido

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] generic question about differences between PCA and DMFA

2012-11-07 Thread Guido Leoni
Dear list I'd like to have your opinion about my case study.
I'm analizing a dataset of 9 experiments and 15 variables with the aim to
highlight the variables that can majorly explain the variance between the
experiments.
This is an example with only 3 rows and 5 variables
  var1 var2 var3 var4 var5  sample5 0,067
0,005 0,008 0,100 0,005  sample6 0,069 0,001 0,011 0,084 0,005  sample7 -7
-5 -1 34 4

My problem is that in some experiments (like in sample7) the measures
related to my variables are measured as delta values  (initial condition -
final condition). In the other cases the variables are measured considering
only the absolute values at my final condition.

After PCA  the model looks like strongly influenced by this difference
(even if my data are centered to 0 and scaled to 1) because in the score
plot I see with the first PC mainly the separation between experiments with
positive and negative values and the second PC is not able to give to me
further informations .
In your opinion is there a way to compare these experiments measured in
this different way?
Alternatively do you think that the Dual Multiple Factor Analysis available
with the package FactorMineR could be a better way to analyze these data?

Thank you for any suggestion
Guido



-- 
Guido Leoni
National Research Institute on Food and Nutrition
(I.N.R.A.N.)
via Ardeatina 546
00178 Rome
Italy

tel + 39 06 51 49 41 (operator)
+ 39 06 51 49 4498 (direct)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question about A2R

2012-08-16 Thread Guido Leoni
Dear List
I'm  trying to install a package not present in cran named A2R (
http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=79)
After running the demo script  I retrieve the following error:
cannot change value of locked binding for '._a2r_counter'

Please could someone  give to me a tip about my error?
Thank you very much
Here is my sessionInfo()

R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=it_IT.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=it_IT.UTF-8LC_COLLATE=it_IT.UTF-8
 [5] LC_MONETARY=it_IT.UTF-8LC_MESSAGES=it_IT.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C

attached base packages:
 [1] splines   stats4grid  stats graphics  grDevices utils
 [8] datasets  methods   base

other attached packages:
 [1] fpc_2.0-3 flexmix_2.3-8 multcomp_1.2-12   survival_2.36-14
 [5] mvtnorm_0.9-9992  modeltools_0.2-19 lattice_0.20-6mclust_3.5
 [9] cluster_1.14.2MASS_7.3-19   A2R_0.0-4

loaded via a namespace (and not attached):
[1] tcltk_2.15.0 tools_2.15.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sugeestion about tuning of SVM

2012-06-15 Thread Guido Leoni
Dear list
I've a generic question about how to tune an SVM
I'm trying to classify  with caret package some population data from a
case-control study . In each column of my matrix there are the SNP
genotypes , in each row there are the individuals.
I correctly splitted my total dataset in training(132 individuals) and test
(50 individuals) (respecting the total observed genotypic frequencies and
the % of cases and controls)
After training (with radial RBF function)  I have an accuracy of the best
model of 76% but applying the model to my test dataset the accuracy
decreases to 52%.
Obviously i expected the decrease but this appear to be quite big in my
opinion.
I manually checked the predictions for my test dataset and some cases that
have no risk allele are not well classified. Similar cases in my training
dataset are well recognized.
Please could you suggest to me which parameters modify  in order to improve
the classification for the test dataset? or better which could be the
causes that could originate this big discrepancy?
I know that my question is very generic but i'm very newbie to this kind of
analysis so please any suggestion is the welcome
thank you very much
Guido

-- 
Guido Leoni
National Research Institute on Food and Nutrition
(I.N.R.A.N.)
via Ardeatina 546
00178 Rome
Italy

tel + 39 06 51 49 41 (operator)
+ 39 06 51 49 4498 (direct)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about sampling

2012-06-14 Thread Guido Leoni
Dear list I wish to extract from a population genotypized for 10 SNP a
subsample of the same population of size n with similar allele frequencies.
Essentially i have a matrix of 200 rows (df) like this
Name,Condition,rs1385699_X,rs6625163_X,rs962458_X,Rs4658627_1,
sample01,Case,1,1,1,-1
sample02,Control,1,1,1,1
sample06,Control,1,-1,1,0
sample10,Case,1,1,1,0
sample11,Control,1,1,1,1
sample24,Control,-1,-1,1,0
sample29,Control,1,-1,1,0
sample42,Case,-1,-1,1,0
sample64,Case,-1,1,1,0

I'm interested to mantain in my subsample the same frequencies of those
observed for the 1 value in each column
I approached the problem with sample() function

mysample-df[sample(1:nrow(df),100,replace=F),]
Then I tested that  the frequencies of each allele in mysample are not
statistically different respect to the initial dataset by mean of prop.test
This seems to work but do you know if there is a package that can do the
same thing  allowing for example a more strict control?
Thank you very much
Guido

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about sampling

2012-06-14 Thread Guido Leoni
Sorry I'm not sure that prob is suitable for my purposes(but i'm quite
newbie with R).
If I correctly understand prob allows to set a weight for each row in the
original dataset in order to include the rows on the basis of their
weights). ... I'm not sure to correctly understanding ;-)
In my case all the rows are equally important. I  need  simply  that my
subset has in each column the same frequency of  1 that in the original
dataset
Thank you again
Guido

2012/6/14 R. Michael Weylandt michael.weyla...@gmail.com

 sample() takes a prob = argument which lets you supply weights, which
 need not sum to one so, if I understand you, you could just pass TRUEs
 and FALSEs for those rows you want. If I'm wrong about that last bit,
 I'm still pretty confident sample(prob = ) is the way to go.

 Best,
 Michael

 On Thu, Jun 14, 2012 at 6:02 AM, Guido Leoni guido.le...@gmail.com
 wrote:
  Dear list I wish to extract from a population genotypized for 10 SNP a
  subsample of the same population of size n with similar allele
 frequencies.
  Essentially i have a matrix of 200 rows (df) like this
  Name,Condition,rs1385699_X,rs6625163_X,rs962458_X,Rs4658627_1,
  sample01,Case,1,1,1,-1
  sample02,Control,1,1,1,1
  sample06,Control,1,-1,1,0
  sample10,Case,1,1,1,0
  sample11,Control,1,1,1,1
  sample24,Control,-1,-1,1,0
  sample29,Control,1,-1,1,0
  sample42,Case,-1,-1,1,0
  sample64,Case,-1,1,1,0
  
  I'm interested to mantain in my subsample the same frequencies of those
  observed for the 1 value in each column
  I approached the problem with sample() function
 
  mysample-df[sample(1:nrow(df),100,replace=F),]
  Then I tested that  the frequencies of each allele in mysample are not
  statistically different respect to the initial dataset by mean of
 prop.test
  This seems to work but do you know if there is a package that can do the
  same thing  allowing for example a more strict control?
  Thank you very much
  Guido
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




-- 
Guido Leoni
National Research Institute on Food and Nutrition
(I.N.R.A.N.)
via Ardeatina 546
00178 Rome
Italy

tel + 39 06 51 49 41 (operator)
+ 39 06 51 49 4498 (direct)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about sampling

2012-06-14 Thread Guido Leoni
Just for make the archives more complete and simplifing the life of the
following readers.
I think to have solved my problem using the caret packages.
In this package there is a function named createData Partition that after
defining a column of interest in a data.frame allows to split a dataset in
subdatasets that try to preserve the original class distribution
here is the link to a tutorial
  http://www.jstatsoft.org/v28/i05/paper

thank you again
Guido

2012/6/14 R. Michael Weylandt michael.weyla...@gmail.com

 I think you're right -- prob probably isn't quite what you need (at
 least, directly): constrained sampling like this is a little trickier
 -- I'll leave this to someone who knows more than me.

 Michael

 On Thu, Jun 14, 2012 at 9:07 AM, Guido Leoni guido.le...@gmail.com
 wrote:
  Sorry I'm not sure that prob is suitable for my purposes(but i'm quite
  newbie with R).
  If I correctly understand prob allows to set a weight for each row in the
  original dataset in order to include the rows on the basis of their
  weights). ... I'm not sure to correctly understanding ;-)
  In my case all the rows are equally important. I  need  simply  that my
  subset has in each column the same frequency of  1 that in the original
  dataset
  Thank you again
  Guido
 
  2012/6/14 R. Michael Weylandt michael.weyla...@gmail.com
 
  sample() takes a prob = argument which lets you supply weights, which
  need not sum to one so, if I understand you, you could just pass TRUEs
  and FALSEs for those rows you want. If I'm wrong about that last bit,
  I'm still pretty confident sample(prob = ) is the way to go.
 
  Best,
  Michael
 
  On Thu, Jun 14, 2012 at 6:02 AM, Guido Leoni guido.le...@gmail.com
  wrote:
   Dear list I wish to extract from a population genotypized for 10 SNP a
   subsample of the same population of size n with similar allele
   frequencies.
   Essentially i have a matrix of 200 rows (df) like this
   Name,Condition,rs1385699_X,rs6625163_X,rs962458_X,Rs4658627_1,
   sample01,Case,1,1,1,-1
   sample02,Control,1,1,1,1
   sample06,Control,1,-1,1,0
   sample10,Case,1,1,1,0
   sample11,Control,1,1,1,1
   sample24,Control,-1,-1,1,0
   sample29,Control,1,-1,1,0
   sample42,Case,-1,-1,1,0
   sample64,Case,-1,1,1,0
   
   I'm interested to mantain in my subsample the same frequencies of
 those
   observed for the 1 value in each column
   I approached the problem with sample() function
  
   mysample-df[sample(1:nrow(df),100,replace=F),]
   Then I tested that  the frequencies of each allele in mysample are not
   statistically different respect to the initial dataset by mean of
   prop.test
   This seems to work but do you know if there is a package that can do
 the
   same thing  allowing for example a more strict control?
   Thank you very much
   Guido
  
  [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
  --
  Guido Leoni
  National Research Institute on Food and Nutrition
  (I.N.R.A.N.)
  via Ardeatina 546
  00178 Rome
  Italy
 
  tel + 39 06 51 49 41 (operator)
  + 39 06 51 49 4498 (direct)




-- 
Guido Leoni
National Research Institute on Food and Nutrition
(I.N.R.A.N.)
via Ardeatina 546
00178 Rome
Italy

tel + 39 06 51 49 41 (operator)
+ 39 06 51 49 4498 (direct)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with sample function

2012-06-08 Thread Guido Leoni
Dear list
Is there a way  to extract a random sample without duplicated row from a
dataframe ?.
a=c(1,2,3,1,1,1,2,1)
 b=c(1,2,3,1,2,1,2,1)
 c=c(1,1,1,1,1,1,1,1)
 d=c(1,2,3,1,1,1,2,1)
 prov-data.frame(a,b,c,d)
prov2-prov[sample(1:nrow(prov),5,replace=T),]
prov2
a b c d
3   3 3 1 3
6   1 1 1 1
3.1 3 3 1 3
5   1 2 1 1
8   1 1 1 1

I tryed  the above code but as you can see sample function includes also
duplicates.
thank you for any tip
Guido

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with sample function

2012-06-08 Thread Guido Leoni
yes you are right  it was my inattention it is friday and my head needs
to start the week end  sorry;-)

2012/6/8 David Winsemius dwinsem...@comcast.net


 On Jun 8, 2012, at 12:33 PM, Guido Leoni wrote:

  Dear list
 Is there a way  to extract a random sample without duplicated row from a
 dataframe ?.
 a=c(1,2,3,1,1,1,2,1)
 b=c(1,2,3,1,2,1,2,1)
 c=c(1,1,1,1,1,1,1,1)
 d=c(1,2,3,1,1,1,2,1)
 prov-data.frame(a,b,c,d)
 prov2-prov[sample(1:nrow(**prov),5,replace=T),]
 prov2
   a b c d
 3   3 3 1 3
 6   1 1 1 1
 3.1 3 3 1 3
 5   1 2 1 1
 8   1 1 1 1

 I tryed  the above code but as you can see sample function includes also
 duplicates.
 thank you for any tip


 Why would you use replace=T if you didn't want duplicates???

 --

 David Winsemius, MD
 West Hartford, CT




-- 
Guido Leoni
National Research Institute on Food and Nutrition
(I.N.R.A.N.)
via Ardeatina 546
00178 Rome
Italy

tel + 39 06 51 49 41 (operator)
+ 39 06 51 49 4498 (direct)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with tune function in e1071 package

2012-06-06 Thread Guido Leoni
Dear list
I'm classifying some data with e1071package but when I try to tune my
parameters I retrieve this error

Error in do.call(method, c(list(train.x[train.ind[[sample]], ], y =
train.y[train.ind[[sample]]]),  :
  'what' must be a character string or a function


below are my command

training-read.csv(file=tabtraining,header=T,row.names=c(1),sep=\t,na.strings=NA)

datatraining=subset(training,select=-Response)
classtraining-Stato
classtraining-subset(training,select=Reponse)
test-read.csv(file=tabtest,header=T,row.names=c(1),sep=\t,na.strings=NA)
datatest=subset(test,select=-Stato)
classtest-subset(test,select=Stato)

model-svm(datatraining,classtraining,type=C-classification)
tune(model,train.x=datatraining,train.y=classtraining,validation.x=datatest,validation.y=classtest,
ranges = list(gamma = 2^(-1:1), cost = 2^(2:4)), control =
tune.control(sampling = fix))

any tips are welcome ;-)
Thank you very much
Guido

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with plot of PCA analysis (new user to R)

2012-03-08 Thread Guido Leoni
Hi
If you wish to obtain a 3d plot i think you can use tha bpca library.
To test if it works  try a code similar to this (after installing bpca
package):

bp-bpca(pca,lambda.end=3)
plot(bp,var.factor=3,rgl.use=T,obj.name=F)

Regards
Guido


2012/3/8 darkpollo darkpo...@gmail.com

 I was thinking to create the plot manually?
 What do you think?

 Thanks

 On Mar 7, 5:37 pm, darkpollo darkpo...@gmail.com wrote:
  Hi,
 
  I am new to R and i am not sure if i am doing something wrong.
 
  I have a table with 4500x24 (rowsxcols) elements. The rows are data
  related to each one of the individuals (A,B,C...) located on the
  columns.
  Example:
   A B C D E F
  1  5.651296  5.480589  4.253070  3.515593  6.045253  5.916222
  4.181060
  2  9.402882 10.007563  9.838700  9.541653  9.968853 10.058527
  9.988849
  3  9.619392 10.358489  9.253168 10.295971  9.478020  9.574001
  9.700798
  4 12.727904 12.624954 12.945821 12.948913 12.703855 12.817992
  12.909623
  5  9.713688 10.057340  9.380006  9.722916  9.590145  9.237900
  9.224427
  6 11.329047 11.403621 12.555482 11.830408 11.479372 13.035209
  14.550555
 
  What i want to do is to make a PCA and plot the 3 principal components
  into a 3D graphic for each one of the individuals.
 
  This is what i am doing:
  data - read.table(data.txt, header=TRUE)
  pca - prcomp(data)
  summary(pca)
  this gives me 24 columns wiht PC1 to PC24
 
  Now i want to plot only the PC1, PC2 and PC3
 
  rp.plot3d(pca[,1],pca[,2],pca[,3])
 
  this give an error incorrect number of dimensions
 
  If i make this instead:
  pca.sam - pca$x
  rp.plot3d(pca.sam[,1],pca.sam[,2],pca.sam[,3])
 
  It works, but it shows 4500 points and i only want the 24 points
  related to my columns.
 
  Any idea how to do this?
 
  Thanks
 
  Thanks
 
  __
  r-h...@r-project.org mailing listhttps://
 stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://
 www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Guido Leoni
National Research Institute on Food and Nutrition
(I.N.R.A.N.)
via Ardeatina 546
00178 Rome
Italy

tel + 39 06 51 49 41 (operator)
+ 39 06 51 49 4498 (direct)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about hierarchical clustering

2012-02-29 Thread Guido Leoni
Dear List i'm performing hierarchical clustering analysis with ward method.
My best clusters are choosen according to silhouette score...
Now I'd like to select the most representative term in each cluster.
Do you think that searching for medoids could be a good idea?
Here is the code that I use  applied to the Usaarrest dataset

mydist = dist(USArrests,method=euclidean)
clusters = cutree(hclust(mydist,method=ward), k=5) # get 5 clusters
mydist = as.matrix(mydist) # get a full matrix

# function to find medoid in cluster i
clust.medoid = function(i, distmat, clusters) {
ind = (clusters == i)

names(which.min(rowSums( distmat[ind, ind] )))
# c(min(rowMeans( distmat[ind, ind] )))
}
#
sapply(unique(clusters), clust.medoid, mydist, clusters)

Best
Guido

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about multiple histogram

2011-12-12 Thread Guido Leoni
Dear list
I have a matrix such as the following:

1 0,5
1 0.7
1 0.5
2 1
3 0
4 0.2
I'd like to plot the histogram of the first column (this is very easy) and
then for each bin of the obtained histogram to plot another histogram on
the z-axis of the frequencies of the second column for the values belonging
to the specific bin.

In other words i have a matrix with 2 column and i'd like to find a way to
represent on a 3 axes plot the frequencies of teh first column and for each
obtained bar the frequencies of the second column
Someone could suggest to me which package or kind of representation could
be the best to achieve my goal?
thank you very much
Guido

-- 
Guido Leoni
National Research Institute on Food and Nutrition
(I.N.R.A.N.)
via Ardeatina 546
00178 Rome
Italy

tel + 39 06 51 49 41 (operator)
+ 39 06 51 49 4498 (direct)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about list function

2010-11-23 Thread Guido Leoni
Dear List I'm a newbie R user.
I'm utilizing the list function in order to make a var like this:
clusters-list(a=var1,b=var2) My problem is that the total numer of
variables that I need to include in my list is up to 200. I've the text
string with the complete list of my variables but is too long to cut and
paste in my bash shell.
So is there a way too import the list from a text file?
Thank you very much for any kind of help
Guido


-- 
Guido Leoni
National Research Institute on Food and Nutrition
(I.N.R.A.N.)
via Ardeatina 546
00178 Rome
Italy

tel + 39 06 51 49 41 (operator)
+ 39 06 51 49 498 (direct)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.