Thanks a lot David for this extended answer The aim is to say: if simulated vs emprical correlate one by one, the sum of both should correlate also
I want to be sure that I understood correctly: What you have done 1) building the model ( the fittingness) according empirical vs simulated value and predict value from this model 2) compare predicted value of the fittingness model with the sum of empirical value, isnt ? Thanks a lot Le lundi 3 décembre 2018, David L Carlson <dcarl...@tamu.edu> a écrit : > This is really a statistics question rather than an R question, but you > did provide reproducible data. You have some moderate correlations for some > of the tests, but they are all different relationships. You used a > combination of base R and dplyr code, but I'll just stick with base R: > > > Mesures.split <- split(Mesures, Mesures$test) > > Corrs <- sapply(Mesures.split, function(x) cor(x[, 3], x[, 4])) > > options(digits=3) > > Corrs > 1 2 3 4 5 6 7 8 9 10 > 0.551 0.437 0.905 -0.106 0.841 0.556 0.809 0.772 0.709 0.512 > > > sapply(Mesures.split, function(x) coef(lm(x[, 3]~x[, 4]))) > 1 2 3 4 5 6 7 > (Intercept) 0.6875 0.6530 -0.2597 2.24313 0.3498 1.4436 0.4103 > x[, 4] 0.0309 0.0034 0.0353 -0.00668 0.0171 0.0168 0.0137 > 8 9 10 > (Intercept) -0.7379 0.2929 0.48115 > x[, 4] 0.0255 0.0129 0.00891 > > This gives you the intercept and slope for the regression lines for each > test. Notice that they vary considerably. The slope value for predicting > behavior from simulated varies from -0.007 to .031. When you average over > space you effectively eliminate the correlations at the test level: > > > Mesures_aggregated <- aggregate(Mesures[, 3:4], by=list(Mesures$Space), > sum) > > cor(Mesures_aggregated[, 2:3])[1, 2] > [1] 0.0771 > > If you sum predicted values for empirical behavior using the 10 regression > equations and compare that to the summed empirical value, things work out > better. > > > pred <- rowSums(sapply(Mesures.split, function(x) predict(lm(x[, 3]~x[, > 4])))) > > cor(Mesures_aggregated[, 2], pred) > [1] 0.776 > > Without knowing where the simulated values come from, especially if they > are completely independent of the empirical values, I can't say if this > approach is wise. > > --------------------------------------- > David L. Carlson > Department of Anthropology > Texas A&M University > > > -----Original Message----- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Fatma Ell > Sent: Sunday, December 2, 2018 4:50 AM > To: r-help@r-project.org > Subject: [R] Does the correlations of component makes the correlation of > one phenomena ? > > Hi, > > I have the following dataset Mesures. It contains test which is a given > context, Space is portion of this following context test. For each test we > have twelve Space and an empirical measure of a behavior > Behavior_empirical and > a mesure of simulated behavior Behavior_simulated. > > Mesures=structure(list(test = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, > 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, > 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, > 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, > 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, > 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Space = c(1L, 2L, 3L, > 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, > 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, > 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, > 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, > 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, > 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, > 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, > 11L, 12L), Behavior_empirical = c(3.02040816326531, 7.95918367346939, > 10.6162790697674, 4.64150943396226, 1.86538461538462, 1.125, > 1.01020408163265, 1.2093023255814, 0.292452830188679, 0, 0, 0, 0, > 1.3265306122449, 0, 3.09433962264151, 0, 1.6875, 2.02040816326531, > 1.2093023255814, 1.75471698113208, 1.79347826086957, > 0.243589743589744, 0, 0.377551020408163, 1.98979591836735, > 6.75581395348837, 6.18867924528302, 7.46153846153846, 0.75, 0, 0, > 0.292452830188679, 0, 0, 0, 0, 1.3265306122449, 1.93023255813953, > 10.8301886792453, 3.73076923076923, 0, 2.69387755102041, > 0.604651162790698, 1.75471698113208, 0, 0, 0, 1.51020408163265, > 2.6530612244898, 3.86046511627907, 1.54716981132075, 1.86538461538462, > 1.875, 2.35714285714286, 1.2093023255814, 0.292452830188679, 0, 0, > 0.823529411764706, 6.79591836734694, 15.2551020408163, > 5.7906976744186, 1.54716981132075, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0.773584905660377, 0, 0, 0.673469387755102, 1.81395348837209, > 1.75471698113208, 2.51086956521739, 3.10576923076923, > 3.70588235294118, 3.77551020408163, 9.28571428571428, > 3.86046511627907, 1.54716981132075, 0, 0, 0, 0, 1.4622641509434, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0.673469387755102, 0, 0.292452830188679, > 4.30434782608696, 1.09615384615385, 5.76470588235294, 0, 0, > 1.93023255813953, 4.64150943396226, 3.73076923076923, 2.625, > 0.673469387755102, 0.604651162790698, 0, 0, 0, 0), Behavior_simulated > = c(18, 61, 129, 198, 128, 57, 44, 80, 36, 8, 0, 0, 0, 0, 0, 49, 50, > 194, 211, 353, 352, 214, 120, 15, 10, 74, 145, 224, 158, 99, 26, 19, > 7, 2, 0, 0, 180, 89, 47, 36, 34, 56, 51, 65, 44, 4, 0, 0, 116, 133, > 131, 103, 74, 132, 75, 44, 0, 0, 0, 0, 532, 165, 18, 5, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 1, 0, 0, 6, 47, 164, 193, 185, 91, 239, 219, 168, > 83, 1, 14, 45, 136, 129, 89, 5, 0, 0, 0, 0, 0, 0, 0, 0, 6, 17, 92, > 280, 273, 0, 6, 25, 108, 129, 285, 171, 181, 39, 2, 0, 0)), .Names = > c("test", "Space", "Behavior_empirical", "Behavior_simulated"), > row.names = c(NA, 120L), class = "data.frame") > > For each test we study correlation between Behavior_empirical > Behavior_simulatedelation > > Correlation <- character()for(i in 1:10){Mes=Mesures[(Mesures$test==i),] > co=data.frame(test=i,value=cor(Mes$Behavior_empirical, > Mes$Behavior_simulated))Correlation > <- rbind(Correlation, as.data.frame(co)) > i=i+1} > > which give us for each test many good correlation values : > > test value1 1 0.55086832 2 0.43690913 3 > 0.90498064 4 -0.10627145 5 0.84101656 6 0.55608257 7 > 0.80880348 8 0.77212329 9 0.708862410 10 0.5116938 > > Now , we want to conclude that, if the we have good values of > Behavior_simulated for each test. It could build the final distribution > which is the sum of Behavior_simulated and then compare with the sum of > Behavior_empirical. > > Mesures_aggregated<- Mesures %>% group_by(Space) %>% > summarize(Sum_Behavior_empirical=sum(Behavior_empirical),Sum_Behavior_ > simulated=sum(Behavior_simulated)) > > I may think that my final correlation result should be good. But it is not > the case > > > cor(Mesures_aggregated$ > > Sum_Behavior_empirical,Mesures_aggregated$Sum_Behavior_simulated)[1] > 0.07710804 > > Is correlation could be a result of correlations of the component of one > phenomena ? and How to evaluate the contribution of each component test in > building the 'Sum`? > > > Thanks a lot for your help. > > > Lenny > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.