> On Jul 13, 2016, at 8:01 AM, David Winsemius <dwinsem...@comcast.net> wrote: > > >> On Jul 13, 2016, at 6:48 AM, stn021 <stn...@gmail.com> wrote: >> >> Hello, >> >> so here a numerical example in R-code. Code is appended below. >> >> The output should be >> 1) the numerical values of the abilities of the persons >> 2) the multiplyer >> >> >> Please note that >> >> 1) I have used non-linear optimization to solve this problem and got >> the expected result, though not with R but other software. >> >> 2) I have applied lm() to this problem, even before I posted the >> question. I am well aware of the syntax of formulas. I my last posting >> I wrote the formula "freehand" so I made the previously mentioned >> errors. Sorry about that. >> >> >> >> Unfortunately the formulas with I() as well as multiplying variables >> before running R does not work here. I() does not apply to factors (R >> tells me) and multiplying in advance also works only for continuous >> variables, not for factors, because there is no known numerical value >> to multiply. >> >> The latter is actually what my question is about, along with the >> question on how to get R to treat two columns as two instances of the >> same factor. >> >> >> Just to be sure I used R to check if the data really counts as a >> factor according to R-terminology. It really is a factor, see code >> below. >> >> >> >> This is the code for generating the example-data: >> >> # --------------------------------------------------------------- # >> pnames = c( "alice" , "bob" , "charlie" , "don" , "eve" , "freddy" >> , "grace" , "henry" ) >> pcount = length( pnames ) >> >> # abilities = runif( pcount ) >> abilities = (1:pcount) / 10 >> >> persons = data.frame( name = pnames , ability = abilities ) >> persons >> >> # random subset of possible combinations and extra df >> combinations = combn( nrow( persons ) , 2 ) ; >> combinations = cbind( combinations,combinations,combinations,combinations ) >> combinations = combinations[ , runif(ncol(combinations))<0.5 ] >> ccount = ncol( combinations ) >> >> observed_data = data.frame( >> idx1 = combinations[1,] >> , idx2 = combinations[2,] >> , p1 = ( persons$name[ combinations[1,] ] ) >> , p2 = ( persons$name[ combinations[2,] ] ) >> ) >> >> abilities_data = data.frame( >> a1 = persons$ability[ combinations[1,] ] >> , a2 = persons$ability[ combinations[2,] ] >> ) >> >> # y = result of cooperation of each pair >> multiplyer = runif(1) + 1 >> offset = 1 >> cat( "multiplyer = " , multiplyer , "\n" ) >> cat( "offset = " , offset , "\n" ) >> >> y0 = multiplyer * ( offset - ( abilities_data$a1 - abilities_data$a2 ) ^ 2 ) >> noise = .05 * rnorm( ccount ) >> >> # check variables are really factors : >> str( observed_data$p1 ) >> dput( observed_data$p1 ) >> >> observed_data = data.frame( y = round( y0+noise,3 ) , observed_data ) >> observed_data >> >> # --------------------------------------------------------------- # > > Is this what is intended? > >> observed_data$p1ab <- persons$ability[ match(observed_data$p1, persons$name) >> ] >> observed_data$p2ab <- persons$ability[ match(observed_data$p2, persons$name) >> ] >> head(observed_data) > y idx1 idx2 p1 p2 p1ab p2ab > 1 1.149 1 6 alice freddy 0.1 0.6 > 2 1.006 1 7 alice grace 0.1 0.7 > 3 1.529 2 3 bob charlie 0.2 0.3 > 4 1.404 2 5 bob eve 0.2 0.5 > 5 1.205 2 6 bob freddy 0.2 0.6 > 6 1.187 2 7 bob grace 0.2 0.7 > > >> lm( y ~ I( (p1ab -p2ab)^2 ), data=observed_data) > > Call: > lm(formula = y ~ I((p1ab - p2ab)^2), data = observed_data) > > Coefficients: > (Intercept) I((p1ab - p2ab)^2) > 1.506 -1.435 > >> separate_term <- lm( y ~ I( (p1ab -p2ab)^2 ), data=observed_data) >> summary(separate_term) > > Call: > lm(formula = y ~ I((p1ab - p2ab)^2), data = observed_data) > > Residuals: > Min 1Q Median 3Q Max > -0.116249 -0.030996 0.002633 0.032765 0.136282 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.50589 0.01067 141.08 <2e-16 *** > I((p1ab - p2ab)^2) -1.43527 0.05863 -24.48 <2e-16 *** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > Residual standard error: 0.05304 on 44 degrees of freedom > Multiple R-squared: 0.9316, Adjusted R-squared: 0.93 > F-statistic: 599.2 on 1 and 44 DF, p-value: < 2.2e-16 > > You could also have compared 2 models differing only with rest to the > includion of an interaction term that was the squared difference in abilities: > >> full <- lm( y ~ p1ab + p2ab + I( (p1ab -p2ab)^2 ), data=observed_data) >> reduced <- lm( y ~ p1ab + p2ab , data=observed_data) >> anova(full,reduced) > Analysis of Variance Table > > Model 1: y ~ p1ab + p2ab + I((p1ab - p2ab)^2) > Model 2: y ~ p1ab + p2ab > Res.Df RSS Df Sum of Sq F Pr(>F) > 1 42 0.11823 > 2 43 0.17315 -1 -0.05492 19.509 6.892e-05 ***
I also tested whether "nesting" the I( ) calls would succeed in giving you any results from what I thought was a rather odd construction >>>> y = f * ( o - ( p1 - p2 )^2 ) > separate_term <- lm( y ~ I( (p1ab -p2ab)^2 ), data=observed_data) > separate_term Call: lm(formula = y ~ I((p1ab - p2ab)^2), data = observed_data) Coefficients: (Intercept) I((p1ab - p2ab)^2) 1.506 -1.435 > separate_term2 <- lm( y ~ I( 1- I( (p1ab -p2ab)^2 )), data=observed_data) > separate_term2 Call: lm(formula = y ~ I(1 - I((p1ab - p2ab)^2)), data = observed_data) Coefficients: (Intercept) I(1 - I((p1ab - p2ab)^2)) 0.07062 1.43527 The sign of the "interaction term" just gets reversed. I don't see that any offset term got calculated for the "1" term. Offsets can be included but I think this particular form where you are asking for two different coefficients, they might not be separable in the linear model framework. -- david. > > > -- > David > >> >> >> 2016-07-11 19:16 GMT+02:00 Jeff Newmiller <jdnew...@dcn.davis.ca.us>: >>> Your clarification is promising. A reproducible example is always >>> preferred, though never a guarantee. I expect to be somewhat preoccupied >>> this week so responses may be rather delayed, but the less setup we have to >>> the more likely that someone on the list will tackle it. >>> >>> Re an answer: If you can make the example simple enough that you can tell >>> us what the right numerical result will be, we will have a better chance of >>> understanding what you are after. E.g. if you start with a solution and >>> use it to create sample input data with then you don't need to actually >>> solve it to illustrate what you are after. [1] >>> >>> Note that I am not aware of any package dedicated to this type of problem, >>> so unless someone else responds otherwise then you will likely have to use >>> bootstrapping or your own statistical analysis (Bayesian?) of the result. >>> >>> [1] >>> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >>> -- >>> Sent from my phone. Please excuse my brevity. >>> >>> On July 11, 2016 7:28:41 AM PDT, stn021 <stn...@gmail.com> wrote: >>>> Hello, >>>> >>>> thank you for the replies. Sorry about the html-email, I forgot. >>>> Should be OK with this email. >>>> >>>> >>>> Don't be fooled be the apparent simplicity of the problem. I have >>>> tried to reduce it to only a single relatively simple question. >>>> >>>> The idea here is to model cooperation of two persons. The model is >>>> about one specific aspect of that cooperation, namely that two persons >>>> with similar abilities may be able to produce better results that two >>>> very different persons. >>>> >>>> That is only one part of the model with other parts modeling for >>>> example the fact that of course two persons with a higher degree of >>>> ability will produce better results per se. >>>> >>>> >>>> It is not classic regression with factors. That can be easily done by >>>> something like lm( y ~ (p1-p2)^2 ). >>>> >>>> This expands to lm( y ~ p1^2 - 2*p1*p2 + p2^2 ). This contains a >>>> multiplicagtions and for lm() this implies interactions between the >>>> factor-levels and produces one parameter for each combination of >>>> factor-levels that occurs in the data. That is not what the question >>>> is about. >>>> >>>> Also p1 and p2 are different levels of the same factor, while for lm() >>>> it would be two different factors with different levels. >>>> >>>> >>>> As for the sensical part: this has a real world application therefore >>>> it makes sense. >>>> >>>> Also it is not so difficult to solve with non-linear optimization. I >>>> was hoping to be able to use R for that purpose because then the >>>> results could easily be checked with statistical tests. >>>> >>>> So my question is not "how to solve" but "how to solve with R". >>>> >>>> >>>> As for the excess degrees of freedom, in real observations there would >>>> of course be added noise due to either random variations or factors >>>> not included in the model. So to generate a more reality-conforming >>>> example I could add some random normal-distributed noise to the >>>> dependent variable y. I previously left that part out because to me it >>>> did not seem relevant. >>>> >>>> >>>> Would you like me to make a complete example dataset with more records >>>> and noise ? >>>> >>>> >>>> The answer I look for would be the numerical values of the >>>> factor-levels and numerical values for the multiplier (f) and the >>>> offset (o), with p1 and p2 given as names (here: persons) and y given >>>> as some level of achievement they reach by cooperating. >>>> >>>> y = f * ( o - ( p1 - p2 )^2 ) >>>> >>>> Is that what you meant by "answer" ? >>>> >>>> >>>> THX >>>> stefan >>>> >>>> >>>> >>>> >>>> 2016-07-10 2:27 GMT+02:00 Jeff Newmiller <jdnew...@dcn.davis.ca.us>: >>>>> >>>>> I have seen less sensical questions. >>>>> >>>>> It would be nice if the example were a bit more complete (as in it >>>> should have excess degrees of freedom and an answer) and less like a >>>> homework problem (which are off topic here). It would of course also be >>>> helpful if the OP were to conform to the Posting Guide, particularly in >>>> respect to using plain text email. >>>>> >>>>> It looks like the kind of nonlinear optimization problem that >>>> evolutionary algorithms are often applied to. It doesn't look (to me) >>>> like a typical problem that factors get applied to in formulas though, >>>> because multiple instances of the same factor variable are present. >>>>> -- >>>>> Sent from my phone. Please excuse my brevity. >>>>> >>>>> On July 9, 2016 4:59:30 PM PDT, Rolf Turner <r.tur...@auckland.ac.nz> >>>> wrote: >>>>>> On 09/07/16 20:52, stn021 wrote: >>>>>>> Hello, >>>>>>> >>>>>>> I would like to analyse a model like this: >>>>>>> >>>>>>> y = 1 * ( 1 - ( x1 - x2 ) ^ 2 ) >>>>>>> >>>>>>> x1 and x2 are not continuous variables but factors, so the >>>>>> observation >>>>>>> contain the level. >>>>>>> Its numerical value is unknown and is to be estimated with the >>>> model. >>>>>>> >>>>>>> >>>>>>> The observations look like this: >>>>>>> >>>>>>> y x1 x2 >>>>>>> 0.96 Alice Bob >>>>>>> 0.84 Alice Charlie >>>>>>> 0.96 Bob Charlie >>>>>>> 0.64 Dave Alice >>>>>>> etc. >>>>>>> >>>>>>> Each person has a numerical value. Here for example Alice = 0.2 >>>> and >>>>>> Bob = >>>>>>> 0.4 >>>>>>> >>>>>>> Then y = 0.96 = 1* ( 1- ( 0.2-0.4 ) ^ 2 ) , see first observation. >>>>>>> >>>>>>> How can this be done in R ? >>>>>> >>>>>> >>>>>> This question makes about as little sense as it is possible to >>>> imagine. >>>>>> >>>>>> cheers, >>>>>> >>>>>> Rolf Turner >>>>> >>> >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.