Re: [R] [FORGED] Regression with factors ?

2016-07-16 Thread stn021
2016-07-13 20:09 GMT+02:00 Jeff Newmiller :
> The formula interface as used in lm and nls searches for separate
> coefficients for each variable.. it will take someone more clever than I to
> figure out how to get the formula interface to think of two variables as
> instances of one factor.
>
> However, R can do nonlinear optimization just fine:
...

Hello Jeff, hello all,


thank you. This appears to be what I was looking for.


Is there a straightforward way to test for significance ?


THX,
stn

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Regression with factors ?

2016-07-13 Thread Jeff Newmiller
The formula interface as used in lm and nls searches for separate 
coefficients for each variable.. it will take someone more clever than I 
to figure out how to get the formula interface to think of two variables 
as instances of one factor.


However, R can do nonlinear optimization just fine:

##
# as if read in using read.csv( fname, as.is=TRUE )
dta <- data.frame( y = observed_data$y
 , p1 = as.character( observed_data$p1 )
 , p2 = as.character( observed_data$p2 )
 , stringsAsFactors = FALSE
 )

lvls <- with( dta, unique( c( p1, p2 ) ) )
dta$p1f <- factor( dta$p1, levels = lvls )
dta$p2f <- factor( dta$p2, levels = lvls )

idxvmult <- length( lvls ) + 1L
idxvoffs <- length( lvls ) + 2L

# all values in a numeric vector
# x = c( valice, vbob, ..., vmult, voffs )
calcY <- function( x ) {
  vmult <- x[ idxvmult ]
  voffs <- x[ idxvoffs ]
  vp1 <- x[ dta$p1f ]
  vp2 <- x[ dta$p2f ]
  vmult * ( voffs - ( vp1 - vp2 )^2 )
}

optfcn <- function( x ) {
  sum( ( dta$y - calcY( x ) ) ^ 2 )
}

oresult <- optim( par = rep( 1, idxvoffs ), optfcn)

result <- list( multiplier = oresult$par[ idxvmult ]
  , offset = oresult$par[ idxvoffs ]
  , values = data.frame( lvls = lvls
   , values = oresult$par[ seq.int( 
length( lvls ) ) ] )

  )
result

#-

I highly recommend reading the help page for optim and the CRAN Task View 
on optimization [1]


[1] https://cran.r-project.org/web/views/Optimization.html

On Wed, 13 Jul 2016, stn021 wrote:


Is this what is intended?


observed_data$p1ab <- persons$ability[ match(observed_data$p1, persons$name) ]
observed_data$p2ab <- persons$ability[ match(observed_data$p2, persons$name) ]



Hello David,

thank you for your answer.


The code in my previous post was intended as an answer to the question
in an earlier post about example-data, quote:


Would you like me to make a complete example dataset with more records and 
noise ?

Yes. And preferably do it with R code.


I should have re-stated this connection in the post.


The code generates a matrix 'observed_data' which is the data the
experimenter would get during the experiment.

This matrix is output in the last line. All other output is only meant
to document the generation-process.

So the only thing visible to the experimenter before analysis is
exactly that matrix 'observed_data'  (usually in the form of some
written documentation which is later entered into statistical
software). Everything before that last line simulates those unknown
parameters that the experiment is supposed to reveal.

The unknown parameters are specifically
- the matrix 'persons'
- and the variable 'multiplyer'

Both are supposed to be revealed by the analyis. p1ab and p2ab would
therefore depend on the unknown parameters and could not be added to
'observed_data' before the analysis.

Sorry again for omitting the back-reference.


I would like to know:

- how to get R to use p1 and p2 as levels of the same factor
(=persons) instead of levels of two different factors.

- how to get R to multiply the numerical levels of factors during the
search for the solution. Factors cannot be multiplied before running
lm() or some other package because before the analysis their numerical
values are not known.


THX, Stefan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Regression with factors ?

2016-07-13 Thread David Winsemius

> On Jul 13, 2016, at 8:01 AM, David Winsemius  wrote:
> 
> 
>> On Jul 13, 2016, at 6:48 AM, stn021  wrote:
>> 
>> Hello,
>> 
>> so here a numerical example in R-code. Code is appended below.
>> 
>> The output should be
>> 1) the numerical values of the abilities of the persons
>> 2) the multiplyer
>> 
>> 
>> Please note that
>> 
>> 1) I have used non-linear optimization to solve this problem and got
>> the expected result, though not with R but other software.
>> 
>> 2) I have applied lm() to this problem, even before I posted the
>> question. I am well aware of the syntax of formulas. I my last posting
>> I wrote the formula "freehand" so I made the previously mentioned
>> errors. Sorry about that.
>> 
>> 
>> 
>> Unfortunately the formulas with I() as well as multiplying variables
>> before running R does not work here. I() does not apply to factors (R
>> tells me) and multiplying in advance also works only for continuous
>> variables, not for factors, because there is no known numerical value
>> to multiply.
>> 
>> The latter is actually what my question is about, along with the
>> question on how to get R to treat two columns as two instances of the
>> same factor.
>> 
>> 
>> Just to be sure I used R to check if the data really counts as a
>> factor according to R-terminology. It really is a factor, see code
>> below.
>> 
>> 
>> 
>> This is the code for generating the example-data:
>> 
>> # --- #
>> pnames= c( "alice" , "bob" , "charlie" , "don" , "eve" , "freddy"
>> , "grace" , "henry" )
>> pcount= length( pnames )
>> 
>> # abilities = runif( pcount )
>> abilities = (1:pcount) / 10
>> 
>> persons = data.frame( name = pnames , ability = abilities )
>> persons
>> 
>> # random subset of possible combinations and extra df
>> combinations = combn( nrow( persons ) , 2 ) ;
>> combinations = cbind( combinations,combinations,combinations,combinations )
>> combinations = combinations[ , runif(ncol(combinations))<0.5 ]
>> ccount = ncol( combinations )
>> 
>> observed_data = data.frame(
>> idx1 = combinations[1,]
>> , idx2 = combinations[2,]
>> , p1 = ( persons$name[combinations[1,] ] )
>> , p2 = ( persons$name[combinations[2,] ] )
>> )
>> 
>> abilities_data = data.frame(
>> a1 = persons$ability[ combinations[1,] ]
>> , a2 = persons$ability[ combinations[2,] ]
>> )
>> 
>> # y = result of cooperation of each pair
>> multiplyer = runif(1) + 1
>> offset = 1
>> cat( "multiplyer = " , multiplyer , "\n" )
>> cat( "offset = " , offset , "\n" )
>> 
>> y0 = multiplyer * ( offset - ( abilities_data$a1 - abilities_data$a2 ) ^ 2 )
>> noise = .05 * rnorm( ccount )
>> 
>> # check variables are really factors :
>> str(  observed_data$p1 )
>> dput( observed_data$p1 )
>> 
>> observed_data = data.frame( y = round( y0+noise,3 ) , observed_data )
>> observed_data
>> 
>> # --- #
> 
> Is this what is intended?
> 
>> observed_data$p1ab <- persons$ability[ match(observed_data$p1, persons$name) 
>> ]
>> observed_data$p2ab <- persons$ability[ match(observed_data$p2, persons$name) 
>> ]
>> head(observed_data)
>  y idx1 idx2p1  p2 p1ab p2ab
> 1 1.14916 alice  freddy  0.1  0.6
> 2 1.00617 alice   grace  0.1  0.7
> 3 1.52923   bob charlie  0.2  0.3
> 4 1.40425   bob eve  0.2  0.5
> 5 1.20526   bob  freddy  0.2  0.6
> 6 1.18727   bob   grace  0.2  0.7
> 
> 
>> lm( y ~ I( (p1ab -p2ab)^2 ), data=observed_data)
> 
> Call:
> lm(formula = y ~ I((p1ab - p2ab)^2), data = observed_data)
> 
> Coefficients:
>   (Intercept)  I((p1ab - p2ab)^2)  
> 1.506  -1.435  
> 
>> separate_term <- lm( y ~ I( (p1ab -p2ab)^2 ), data=observed_data)
>> summary(separate_term)
> 
> Call:
> lm(formula = y ~ I((p1ab - p2ab)^2), data = observed_data)
> 
> Residuals:
>  Min1QMedian3Q   Max 
> -0.116249 -0.030996  0.002633  0.032765  0.136282 
> 
> Coefficients:
>   Estimate Std. Error t value Pr(>|t|)
> (Intercept) 1.505890.01067  141.08   <2e-16 ***
> I((p1ab - p2ab)^2) -1.435270.05863  -24.48   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 0.05304 on 44 degrees of freedom
> Multiple R-squared:  0.9316,  Adjusted R-squared:   0.93 
> F-statistic: 599.2 on 1 and 44 DF,  p-value: < 2.2e-16
> 
> You could also have compared 2 models differing only with rest to the 
> includion of an interaction term that was the squared difference in abilities:
> 
>> full <- lm( y ~ p1ab + p2ab + I( (p1ab -p2ab)^2 ), data=observed_data)
>> reduced <- lm( y ~ p1ab + p2ab , data=observed_data)
>> anova(full,reduced)
> Analysis of Variance Table
> 
> Model 1: y ~ p1ab + p2ab + I((p1ab - p2ab)^2)
> Model 2: y ~ p1ab + p2ab
>  Res.Df RSS Df Sum of Sq  FPr(>F)
> 1 42 0.11823 

Re: [R] [FORGED] Regression with factors ?

2016-07-13 Thread stn021
> Is this what is intended?
>
>> observed_data$p1ab <- persons$ability[ match(observed_data$p1, persons$name) 
>> ]
>> observed_data$p2ab <- persons$ability[ match(observed_data$p2, persons$name) 
>> ]


Hello David,

thank you for your answer.


The code in my previous post was intended as an answer to the question
in an earlier post about example-data, quote:

> > Would you like me to make a complete example dataset with more records and 
> > noise ?
> Yes. And preferably do it with R code.

I should have re-stated this connection in the post.


The code generates a matrix 'observed_data' which is the data the
experimenter would get during the experiment.

This matrix is output in the last line. All other output is only meant
to document the generation-process.

So the only thing visible to the experimenter before analysis is
exactly that matrix 'observed_data'  (usually in the form of some
written documentation which is later entered into statistical
software). Everything before that last line simulates those unknown
parameters that the experiment is supposed to reveal.

The unknown parameters are specifically
- the matrix 'persons'
- and the variable 'multiplyer'

Both are supposed to be revealed by the analyis. p1ab and p2ab would
therefore depend on the unknown parameters and could not be added to
'observed_data' before the analysis.

Sorry again for omitting the back-reference.


I would like to know:

- how to get R to use p1 and p2 as levels of the same factor
(=persons) instead of levels of two different factors.

- how to get R to multiply the numerical levels of factors during the
search for the solution. Factors cannot be multiplied before running
lm() or some other package because before the analysis their numerical
values are not known.


THX, Stefan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Regression with factors ?

2016-07-13 Thread David Winsemius

> On Jul 13, 2016, at 6:48 AM, stn021  wrote:
> 
> Hello,
> 
> so here a numerical example in R-code. Code is appended below.
> 
> The output should be
> 1) the numerical values of the abilities of the persons
> 2) the multiplyer
> 
> 
> Please note that
> 
> 1) I have used non-linear optimization to solve this problem and got
> the expected result, though not with R but other software.
> 
> 2) I have applied lm() to this problem, even before I posted the
> question. I am well aware of the syntax of formulas. I my last posting
> I wrote the formula "freehand" so I made the previously mentioned
> errors. Sorry about that.
> 
> 
> 
> Unfortunately the formulas with I() as well as multiplying variables
> before running R does not work here. I() does not apply to factors (R
> tells me) and multiplying in advance also works only for continuous
> variables, not for factors, because there is no known numerical value
> to multiply.
> 
> The latter is actually what my question is about, along with the
> question on how to get R to treat two columns as two instances of the
> same factor.
> 
> 
> Just to be sure I used R to check if the data really counts as a
> factor according to R-terminology. It really is a factor, see code
> below.
> 
> 
> 
> This is the code for generating the example-data:
> 
> # --- #
> pnames= c( "alice" , "bob" , "charlie" , "don" , "eve" , "freddy"
> , "grace" , "henry" )
> pcount= length( pnames )
> 
> # abilities = runif( pcount )
> abilities = (1:pcount) / 10
> 
> persons = data.frame( name = pnames , ability = abilities )
> persons
> 
> # random subset of possible combinations and extra df
> combinations = combn( nrow( persons ) , 2 ) ;
> combinations = cbind( combinations,combinations,combinations,combinations )
> combinations = combinations[ , runif(ncol(combinations))<0.5 ]
> ccount = ncol( combinations )
> 
> observed_data = data.frame(
>  idx1 = combinations[1,]
> , idx2 = combinations[2,]
> , p1 = ( persons$name[combinations[1,] ] )
> , p2 = ( persons$name[combinations[2,] ] )
> )
> 
> abilities_data = data.frame(
>  a1 = persons$ability[ combinations[1,] ]
> , a2 = persons$ability[ combinations[2,] ]
> )
> 
> # y = result of cooperation of each pair
> multiplyer = runif(1) + 1
> offset = 1
> cat( "multiplyer = " , multiplyer , "\n" )
> cat( "offset = " , offset , "\n" )
> 
> y0 = multiplyer * ( offset - ( abilities_data$a1 - abilities_data$a2 ) ^ 2 )
> noise = .05 * rnorm( ccount )
> 
> # check variables are really factors :
> str(  observed_data$p1 )
> dput( observed_data$p1 )
> 
> observed_data = data.frame( y = round( y0+noise,3 ) , observed_data )
> observed_data
> 
> # --- #

Is this what is intended?

> observed_data$p1ab <- persons$ability[ match(observed_data$p1, persons$name) ]
> observed_data$p2ab <- persons$ability[ match(observed_data$p2, persons$name) ]
> head(observed_data)
  y idx1 idx2p1  p2 p1ab p2ab
1 1.14916 alice  freddy  0.1  0.6
2 1.00617 alice   grace  0.1  0.7
3 1.52923   bob charlie  0.2  0.3
4 1.40425   bob eve  0.2  0.5
5 1.20526   bob  freddy  0.2  0.6
6 1.18727   bob   grace  0.2  0.7


> lm( y ~ I( (p1ab -p2ab)^2 ), data=observed_data)

Call:
lm(formula = y ~ I((p1ab - p2ab)^2), data = observed_data)

Coefficients:
   (Intercept)  I((p1ab - p2ab)^2)  
 1.506  -1.435  

>  separate_term <- lm( y ~ I( (p1ab -p2ab)^2 ), data=observed_data)
> summary(separate_term)

Call:
lm(formula = y ~ I((p1ab - p2ab)^2), data = observed_data)

Residuals:
  Min1QMedian3Q   Max 
-0.116249 -0.030996  0.002633  0.032765  0.136282 

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.505890.01067  141.08   <2e-16 ***
I((p1ab - p2ab)^2) -1.435270.05863  -24.48   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.05304 on 44 degrees of freedom
Multiple R-squared:  0.9316,Adjusted R-squared:   0.93 
F-statistic: 599.2 on 1 and 44 DF,  p-value: < 2.2e-16

You could also have compared 2 models differing only with rest to the includion 
of an interaction term that was the squared difference in abilities:

> full <- lm( y ~ p1ab + p2ab + I( (p1ab -p2ab)^2 ), data=observed_data)
> reduced <- lm( y ~ p1ab + p2ab , data=observed_data)
> anova(full,reduced)
Analysis of Variance Table

Model 1: y ~ p1ab + p2ab + I((p1ab - p2ab)^2)
Model 2: y ~ p1ab + p2ab
  Res.Df RSS Df Sum of Sq  FPr(>F)
1 42 0.11823  
2 43 0.17315 -1  -0.05492 19.509 6.892e-05 ***


-- 
David

> 
> 
> 2016-07-11 19:16 GMT+02:00 Jeff Newmiller :
>> Your clarification is promising.  A reproducible example is always 
>> preferred, though never a 

Re: [R] [FORGED] Regression with factors ?

2016-07-13 Thread stn021
Hello,

so here a numerical example in R-code. Code is appended below.

The output should be
1) the numerical values of the abilities of the persons
2) the multiplyer


Please note that

1) I have used non-linear optimization to solve this problem and got
the expected result, though not with R but other software.

2) I have applied lm() to this problem, even before I posted the
question. I am well aware of the syntax of formulas. I my last posting
I wrote the formula "freehand" so I made the previously mentioned
errors. Sorry about that.



Unfortunately the formulas with I() as well as multiplying variables
before running R does not work here. I() does not apply to factors (R
tells me) and multiplying in advance also works only for continuous
variables, not for factors, because there is no known numerical value
to multiply.

The latter is actually what my question is about, along with the
question on how to get R to treat two columns as two instances of the
same factor.


Just to be sure I used R to check if the data really counts as a
factor according to R-terminology. It really is a factor, see code
below.



This is the code for generating the example-data:

# --- #
pnames= c( "alice" , "bob" , "charlie" , "don" , "eve" , "freddy"
, "grace" , "henry" )
pcount= length( pnames )

# abilities = runif( pcount )
abilities = (1:pcount) / 10

persons = data.frame( name = pnames , ability = abilities )
persons

# random subset of possible combinations and extra df
combinations = combn( nrow( persons ) , 2 ) ;
combinations = cbind( combinations,combinations,combinations,combinations )
combinations = combinations[ , runif(ncol(combinations))<0.5 ]
ccount = ncol( combinations )

observed_data = data.frame(
  idx1 = combinations[1,]
, idx2 = combinations[2,]
, p1 = ( persons$name[combinations[1,] ] )
, p2 = ( persons$name[combinations[2,] ] )
)

abilities_data = data.frame(
  a1 = persons$ability[ combinations[1,] ]
, a2 = persons$ability[ combinations[2,] ]
)

# y = result of cooperation of each pair
multiplyer = runif(1) + 1
offset = 1
cat( "multiplyer = " , multiplyer , "\n" )
cat( "offset = " , offset , "\n" )

y0 = multiplyer * ( offset - ( abilities_data$a1 - abilities_data$a2 ) ^ 2 )
noise = .05 * rnorm( ccount )

# check variables are really factors :
str(  observed_data$p1 )
dput( observed_data$p1 )

observed_data = data.frame( y = round( y0+noise,3 ) , observed_data )
observed_data

# --- #


2016-07-11 19:16 GMT+02:00 Jeff Newmiller :
> Your clarification is promising.  A reproducible example is always preferred, 
> though never a guarantee. I expect to be somewhat preoccupied this week so 
> responses may be rather delayed, but the less setup we have to the more 
> likely that someone on the list will tackle it.
>
> Re an answer: If you can make the example simple enough that you can tell us 
> what the right numerical result will be, we will have a better chance of 
> understanding what you are after.  E.g. if you start with a solution and use 
> it to create sample input data with then you don't need to actually solve it 
> to illustrate what you are after. [1]
>
> Note that I am not aware of any package dedicated to this type of problem, so 
> unless someone else responds otherwise then you will likely have to use 
> bootstrapping or your own statistical analysis (Bayesian?) of the result.
>
> [1] 
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
> --
> Sent from my phone. Please excuse my brevity.
>
> On July 11, 2016 7:28:41 AM PDT, stn021  wrote:
>>Hello,
>>
>>thank you for the replies. Sorry about the html-email, I forgot.
>>Should be OK with this email.
>>
>>
>>Don't be fooled be the apparent simplicity of the problem. I have
>>tried to reduce it to only a single relatively simple question.
>>
>>The idea here is to model cooperation of two persons. The model is
>>about one specific aspect of that cooperation, namely that two persons
>>with similar abilities may be able to produce better results that two
>>very different persons.
>>
>>That is only one part of the model with other parts modeling for
>>example the fact that of course two persons with a higher degree of
>>ability will produce better results per se.
>>
>>
>>It is not classic regression with factors. That can be easily done by
>>something like lm( y ~ (p1-p2)^2 ).
>>
>>This expands to lm( y ~ p1^2 - 2*p1*p2 + p2^2 ). This contains a
>>multiplicagtions and for lm() this implies interactions between the
>>factor-levels and produces one parameter for each combination of
>>factor-levels that occurs in the data. That is not what the question
>>is about.
>>
>>Also p1 and p2 are different levels of the same factor, while for lm()
>>it would be two different factors with different levels.
>>
>>
>>As for the sensical part: 

Re: [R] [FORGED] Regression with factors ?

2016-07-11 Thread Jeff Newmiller
Your clarification is promising.  A reproducible example is always preferred, 
though never a guarantee. I expect to be somewhat preoccupied this week so 
responses may be rather delayed, but the less setup we have to the more likely 
that someone on the list will tackle it.

Re an answer: If you can make the example simple enough that you can tell us 
what the right numerical result will be, we will have a better chance of 
understanding what you are after.  E.g. if you start with a solution and use it 
to create sample input data with then you don't need to actually solve it to 
illustrate what you are after. [1]

Note that I am not aware of any package dedicated to this type of problem, so 
unless someone else responds otherwise then you will likely have to use 
bootstrapping or your own statistical analysis (Bayesian?) of the result. 

[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
-- 
Sent from my phone. Please excuse my brevity.

On July 11, 2016 7:28:41 AM PDT, stn021  wrote:
>Hello,
>
>thank you for the replies. Sorry about the html-email, I forgot.
>Should be OK with this email.
>
>
>Don't be fooled be the apparent simplicity of the problem. I have
>tried to reduce it to only a single relatively simple question.
>
>The idea here is to model cooperation of two persons. The model is
>about one specific aspect of that cooperation, namely that two persons
>with similar abilities may be able to produce better results that two
>very different persons.
>
>That is only one part of the model with other parts modeling for
>example the fact that of course two persons with a higher degree of
>ability will produce better results per se.
>
>
>It is not classic regression with factors. That can be easily done by
>something like lm( y ~ (p1-p2)^2 ).
>
>This expands to lm( y ~ p1^2 - 2*p1*p2 + p2^2 ). This contains a
>multiplicagtions and for lm() this implies interactions between the
>factor-levels and produces one parameter for each combination of
>factor-levels that occurs in the data. That is not what the question
>is about.
>
>Also p1 and p2 are different levels of the same factor, while for lm()
>it would be two different factors with different levels.
>
>
>As for the sensical part: this has a real world application therefore
>it makes sense.
>
>Also it is not so difficult to solve with non-linear optimization. I
>was hoping to be able to use R for that purpose because then the
>results could easily be checked with statistical tests.
>
>So my question is not "how to solve" but "how to solve with R".
>
>
>As for the excess degrees of freedom, in real observations there would
>of course be added noise due to either random variations or factors
>not included in the model. So to generate a more reality-conforming
>example I could add some random normal-distributed noise to the
>dependent variable y. I previously left that part out because to me it
>did not seem relevant.
>
>
>Would you like me to make a complete example dataset with more records
>and noise ?
>
>
>The answer I look for would be the numerical values of the
>factor-levels and numerical values for the multiplier (f) and the
>offset (o), with p1 and p2 given as names (here: persons) and y given
>as some level of achievement they reach by cooperating.
>
>y = f * ( o - ( p1 - p2 )^2 )
>
>Is that what you meant by "answer" ?
>
>
>THX
>stefan
>
>
>
>
>2016-07-10 2:27 GMT+02:00 Jeff Newmiller :
>>
>> I have seen less sensical questions.
>>
>> It would be nice if the example were a bit more complete (as in it
>should have excess degrees of freedom and an answer) and less like a
>homework problem (which are off topic here). It would of course also be
>helpful if the OP were to conform to the Posting Guide, particularly in
>respect to using plain text email.
>>
>> It looks like the kind of nonlinear optimization problem that
>evolutionary algorithms are often applied to. It doesn't look (to me)
>like a typical problem that factors get applied to in formulas though,
>because multiple instances of the same factor variable are present.
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On July 9, 2016 4:59:30 PM PDT, Rolf Turner 
>wrote:
>> >On 09/07/16 20:52, stn021 wrote:
>> >> Hello,
>> >>
>> >> I would like to analyse a model like this:
>> >>
>> >> y = 1 *  ( 1 - ( x1 - x2 )  ^ 2   )
>> >>
>> >> x1 and x2 are not continuous variables but factors, so the
>> >observation
>> >> contain the level.
>> >> Its numerical value is unknown and is to be estimated with the
>model.
>> >>
>> >>
>> >> The observations look like this:
>> >>
>> >> yx1 x2
>> >> 0.96  Alice  Bob
>> >> 0.84  Alice  Charlie
>> >> 0.96  Bob   Charlie
>> >> 0.64  Dave Alice
>> >> etc.
>> >>
>> >> Each person has a numerical value. Here for example Alice = 0.2
>and
>> >Bob =
>> >> 0.4
>> >>
>> >> Then y = 0.96 = 1* ( 1- ( 0.2-0.4 ) ^ 2 ) , see first observation.

Re: [R] [FORGED] Regression with factors ?

2016-07-11 Thread David Winsemius

> On Jul 11, 2016, at 7:28 AM, stn021  wrote:
> 
> Hello,
> 
> thank you for the replies. Sorry about the html-email, I forgot.
> Should be OK with this email.
> 
> 
> Don't be fooled be the apparent simplicity of the problem. I have
> tried to reduce it to only a single relatively simple question.

It would be useful to know whether this is a design effort and the data is not 
yet recorded or this is an analysis effort for data that is "in the can".
> 
> The idea here is to model cooperation of two persons. The model is
> about one specific aspect of that cooperation, namely that two persons
> with similar abilities may be able to produce better results that two
> very different persons.
> 
> That is only one part of the model with other parts modeling for
> example the fact that of course two persons with a higher degree of
> ability will produce better results per se.
> 
> 
> It is not classic regression with factors. That can be easily done by
> something like lm( y ~ (p1-p2)^2 ).

No. The caret "^" is an interaction operator in the formula context (not a 
power operator) and the minus sign causes variable removal.

Read:

?formula

If you want to create a calculated value that is the squared difference of two 
variables, then you need to do it either with `I` or in the dataframe before 
submission to the regression function.


> 
> This expands to lm( y ~ p1^2 - 2*p1*p2 + p2^2 ).

Used in a formula, p1^2 is exactly equal to p1.


> This contains a
> multiplicagtions and for lm() this implies interactions between the
> factor-levels and produces one parameter for each combination of
> factor-levels that occurs in the data. That is not what the question
> is about.
> 
> Also p1 and p2 are different levels of the same factor, while for lm()
> it would be two different factors with different levels.

Given your apparent lack of knowledge about R's formula syntax, we are also now 
unclear if you are using the word "factor" in the colloquial sense or as a 
technical term for discrete (factor) variables in R. What kind of values can p1 
and p2 take?


> As for the sensical part: this has a real world application therefore
> it makes sense.
> 
> Also it is not so difficult to solve with non-linear optimization. I
> was hoping to be able to use R for that purpose because then the
> results could easily be checked with statistical tests.
> 
> So my question is not "how to solve" but "how to solve with R".
> 
> 
> As for the excess degrees of freedom, in real observations there would
> of course be added noise due to either random variations or factors
> not included in the model. So to generate a more reality-conforming
> example I could add some random normal-distributed noise to the
> dependent variable y. I previously left that part out because to me it
> did not seem relevant.

Knowing the nature of the outcome variable is generally important in 
statistical design.
> 
> 
> Would you like me to make a complete example dataset with more records
> and noise ?

Yes. And preferably do it with R code.

> 
> The answer I look for would be the numerical values of the
> factor-levels and numerical values for the multiplier (f) and the
> offset (o), with p1 and p2 given as names (here: persons) and y given
> as some level of achievement they reach by cooperating.
> 
> y = f * ( o - ( p1 - p2 )^2 )
> 
> Is that what you meant by "answer" ?

Not really. We would expect to see some data, at least dummy data, in a form 
that could be used for testing and demonstration.  The nature of "f" is 
particularly unclear (in large part because the science or "reality" is not 
described.)  Is it a function?  The "o" is probably going to be returned as an 
"(Intercept)". You started out with `lm` which would have little to do with 
non-linear optimization. You then said it "would not be so difficult" to do 
non-linear optimization of "something" which was not really specified with any 
substance. Without data and code it still reads as a salad of fragments of 
terminology lacking reference to a well-described scientific substrate. 

An "answer" would be:

Describe an experiment or a well designed set of observations with a specific 
outcome. Describe the hypotheses. Present code or data with a desired analysis 
plan. Ask for problems in R coding.



An off-topic question would be:

Help me design my psychology class project.


-- 
David.


> 
> 
> THX
> stefan
> 
> 
> 
> 
> 2016-07-10 2:27 GMT+02:00 Jeff Newmiller :
>> 
>> I have seen less sensical questions.
>> 
>> It would be nice if the example were a bit more complete (as in it should 
>> have excess degrees of freedom and an answer) and less like a homework 
>> problem (which are off topic here). It would of course also be helpful if 
>> the OP were to conform to the Posting Guide, particularly in respect to 
>> using plain text email.
>> 
>> It looks like the kind of nonlinear optimization problem that evolutionary 
>> algorithms 

Re: [R] [FORGED] Regression with factors ?

2016-07-11 Thread stn021
Hello,

thank you for the replies. Sorry about the html-email, I forgot.
Should be OK with this email.


Don't be fooled be the apparent simplicity of the problem. I have
tried to reduce it to only a single relatively simple question.

The idea here is to model cooperation of two persons. The model is
about one specific aspect of that cooperation, namely that two persons
with similar abilities may be able to produce better results that two
very different persons.

That is only one part of the model with other parts modeling for
example the fact that of course two persons with a higher degree of
ability will produce better results per se.


It is not classic regression with factors. That can be easily done by
something like lm( y ~ (p1-p2)^2 ).

This expands to lm( y ~ p1^2 - 2*p1*p2 + p2^2 ). This contains a
multiplicagtions and for lm() this implies interactions between the
factor-levels and produces one parameter for each combination of
factor-levels that occurs in the data. That is not what the question
is about.

Also p1 and p2 are different levels of the same factor, while for lm()
it would be two different factors with different levels.


As for the sensical part: this has a real world application therefore
it makes sense.

Also it is not so difficult to solve with non-linear optimization. I
was hoping to be able to use R for that purpose because then the
results could easily be checked with statistical tests.

So my question is not "how to solve" but "how to solve with R".


As for the excess degrees of freedom, in real observations there would
of course be added noise due to either random variations or factors
not included in the model. So to generate a more reality-conforming
example I could add some random normal-distributed noise to the
dependent variable y. I previously left that part out because to me it
did not seem relevant.


Would you like me to make a complete example dataset with more records
and noise ?


The answer I look for would be the numerical values of the
factor-levels and numerical values for the multiplier (f) and the
offset (o), with p1 and p2 given as names (here: persons) and y given
as some level of achievement they reach by cooperating.

y = f * ( o - ( p1 - p2 )^2 )

Is that what you meant by "answer" ?


THX
stefan




2016-07-10 2:27 GMT+02:00 Jeff Newmiller :
>
> I have seen less sensical questions.
>
> It would be nice if the example were a bit more complete (as in it should 
> have excess degrees of freedom and an answer) and less like a homework 
> problem (which are off topic here). It would of course also be helpful if the 
> OP were to conform to the Posting Guide, particularly in respect to using 
> plain text email.
>
> It looks like the kind of nonlinear optimization problem that evolutionary 
> algorithms are often applied to. It doesn't look (to me) like a typical 
> problem that factors get applied to in formulas though, because multiple 
> instances of the same factor variable are present.
> --
> Sent from my phone. Please excuse my brevity.
>
> On July 9, 2016 4:59:30 PM PDT, Rolf Turner  wrote:
> >On 09/07/16 20:52, stn021 wrote:
> >> Hello,
> >>
> >> I would like to analyse a model like this:
> >>
> >> y = 1 *  ( 1 - ( x1 - x2 )  ^ 2   )
> >>
> >> x1 and x2 are not continuous variables but factors, so the
> >observation
> >> contain the level.
> >> Its numerical value is unknown and is to be estimated with the model.
> >>
> >>
> >> The observations look like this:
> >>
> >> yx1 x2
> >> 0.96  Alice  Bob
> >> 0.84  Alice  Charlie
> >> 0.96  Bob   Charlie
> >> 0.64  Dave Alice
> >> etc.
> >>
> >> Each person has a numerical value. Here for example Alice = 0.2 and
> >Bob =
> >> 0.4
> >>
> >> Then y = 0.96 = 1* ( 1- ( 0.2-0.4 ) ^ 2 ) , see first observation.
> >>
> >> How can this be done in R ?
> >
> >
> >This question makes about as little sense as it is possible to imagine.
> >
> >cheers,
> >
> >Rolf Turner
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Regression with factors ?

2016-07-09 Thread Jeff Newmiller
I have seen less sensical questions.

It would be nice if the example were a bit more complete (as in it should have 
excess degrees of freedom and an answer) and less like a homework problem 
(which are off topic here). It would of course also be helpful if the OP were 
to conform to the Posting Guide, particularly in respect to using plain text 
email. 

It looks like the kind of nonlinear optimization problem that evolutionary 
algorithms are often applied to. It doesn't look (to me) like a typical problem 
that factors get applied to in formulas though, because multiple instances of 
the same factor variable are present.
-- 
Sent from my phone. Please excuse my brevity.

On July 9, 2016 4:59:30 PM PDT, Rolf Turner  wrote:
>On 09/07/16 20:52, stn021 wrote:
>> Hello,
>>
>> I would like to analyse a model like this:
>>
>> y = 1 *  ( 1 - ( x1 - x2 )  ^ 2   )
>>
>> x1 and x2 are not continuous variables but factors, so the
>observation
>> contain the level.
>> Its numerical value is unknown and is to be estimated with the model.
>>
>>
>> The observations look like this:
>>
>> yx1 x2
>> 0.96  Alice  Bob
>> 0.84  Alice  Charlie
>> 0.96  Bob   Charlie
>> 0.64  Dave Alice
>> etc.
>>
>> Each person has a numerical value. Here for example Alice = 0.2 and
>Bob =
>> 0.4
>>
>> Then y = 0.96 = 1* ( 1- ( 0.2-0.4 ) ^ 2 ) , see first observation.
>>
>> How can this be done in R ?
>
>
>This question makes about as little sense as it is possible to imagine.
>
>cheers,
>
>Rolf Turner

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Regression with factors ?

2016-07-09 Thread Rolf Turner

On 09/07/16 20:52, stn021 wrote:

Hello,

I would like to analyse a model like this:

y = 1 *  ( 1 - ( x1 - x2 )  ^ 2   )

x1 and x2 are not continuous variables but factors, so the observation
contain the level.
Its numerical value is unknown and is to be estimated with the model.


The observations look like this:

yx1 x2
0.96  Alice  Bob
0.84  Alice  Charlie
0.96  Bob   Charlie
0.64  Dave Alice
etc.

Each person has a numerical value. Here for example Alice = 0.2 and Bob =
0.4

Then y = 0.96 = 1* ( 1- ( 0.2-0.4 ) ^ 2 ) , see first observation.

How can this be done in R ?



This question makes about as little sense as it is possible to imagine.

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.