[R] display double dot over character in plotmath?

2017-05-12 Thread Ranjan Maitra
Hi,

Is it possible to display double dot (umlaut) over a character such as would be 
possible using \ddot x in LaTeX? I can do this using tikzDevice but I wanted 
something simpler to point to. 

Here is an example of what I would like to do, but it is not quite there:

require(ggplot2)
data<-as.data.frame(c("a","b","c","a","b","c"))
colnames(data)<-"Y"
data$X<-c(1:6)
data$Z<-c(1,2,3,1,2,3)

ggplot(data, aes(x=X)) + geom_line(aes(y = Z), size=0.43) + 
xlab(expression(atop(top,bold(Age~"à"

I would like to put in a double dot over the "a" in the x-axis instead of "`".

Many thanks for any suggestions and best wishes,
Ranjan

-- 
Important Notice: This mailbox is ignored: e-mails are set to be deleted on 
receipt. Please respond to the mailing list if appropriate. For those needing 
to send personal or professional e-mail, please use appropriate addresses.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] installing caret package

2017-05-12 Thread Duncan Murdoch

On 12/05/2017 4:21 PM, Elahe chalabi via R-help wrote:

Hi all,

I'm using Rstudio 64 bit version3.2.5 and I faced a problem installing caret 
package,the error is :

Loading required package: lattice
Loading required package: ggplot2
Error : object ‘sigma’ is not exported by 'namespace:stats'
Error: package or namespace load failed for ‘caret’


The caret function was introduced in R 3.3.0.  You need to update R to 
be at least as new as that.


You might also let the maintainer of caret know about this; they should 
have "Depends:  R (>= 3.3.0)" or something similar to prevent this kind 
of error.  (But I don't see "sigma" being used in the current source, so 
this may have been addressed already, or the error message may be 
tricking me into looking in the wrong place.)


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] installing caret package

2017-05-12 Thread David Winsemius

> On May 12, 2017, at 1:21 PM, Elahe chalabi via R-help  
> wrote:
> 
> Hi all,
> 
> I'm using Rstudio 64 bit version3.2.5

I'm guessing that means using R 3.2.5, at least if RStudio is keeping its 
different os versions numering in sync. I'm on R studio 1.0.136 and I don't 
think I'm that out of date. So the stats package is part of the core load and 
to not have that function suggests that it was added and that you are using 
code from a later package version. You could try dropping back to a caret 
version from two years ago and see if you get better results ... or you could 
exit RStudio, and update R.

> and I faced a problem installing caret package,the error is :
> 
> Loading required package: lattice
> Loading required package: ggplot2
> Error : object ‘sigma’ is not exported by 'namespace:stats'
> Error: package or namespace load failed for ‘caret’
> 
> how should I solve the problem with this sigma?!
> 
> Thanks for any help.
> Elahe
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] installing caret package

2017-05-12 Thread Elahe chalabi via R-help
Hi all,

I'm using Rstudio 64 bit version3.2.5 and I faced a problem installing caret 
package,the error is :

Loading required package: lattice
Loading required package: ggplot2
Error : object ‘sigma’ is not exported by 'namespace:stats'
Error: package or namespace load failed for ‘caret’

how should I solve the problem with this sigma?!

Thanks for any help.
Elahe

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error: invalid type (closure) for the variable 'time' - object specific trend

2017-05-12 Thread peter dalgaard

> On 12 May 2017, at 16:40 , Tobias Christoph  wrote:
> 
> Hey guys,
> 
> thanks a lot for your tips. The regression is finally running. As you 
> said, I had to integrate the column "year" in the function "time" in R.
> 
> So I used the following formula: *plm(log(revenue) ~ log(supply) + 
> factor(town)*time(year), data=R_Test_log_Neu)*

Um, that might not do what I think you think it does. time() gives you the 
"vector of times at which a time series was sampled". If you feed it any 
regular vector, it just gives the numbers 1:n, witness

> time(rnorm(20))
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
attr(,"tsp")
[1]  1 20  1

I suspect you just want "year" in the formula.

> 
> So I have now sucessfully added a linear trend to my regression model? 
> Another question that concernes me is how to add a quadratic trend 
> instead of a linear trend. Can I just square the column "year"?

In principle, yes, but as others have pointed out, centering the variable may 
be a good idea, for numerical stability.

-pd

> 
> Enjoy your weekend,
> 
> Toby
> 
> _My results see below:_
> 
> Balanced Panel: n=11, T=12, N=132
> 
> Residuals :
> Min.  1st Qu.   Median  3rd Qu. Max.
> -0.09610 -0.02370 -0.00152  0.01980  0.14000
> 
> Coefficients :
> Estimate Std. Error t-value  Pr(>|t|)
> log(supply)   -0.0080702  0.0133675 -0.6037  0.547411
> factor(town)2:time(year)  -0.0063245  0.0053744 -1.1768  0.242100
> factor(town)3:time(year)   0.0295522  0.0056776  5.2050 1.053e-06 ***
> factor(town)4:time(year)   0.0062191  0.0054152  1.1485  0.253549
> factor(town)5:time(year)   0.0159028  0.0054954  2.8939  0.004681 **
> factor(town)6:time(year)   0.0237112  0.0055395  4.2804 4.316e-05 ***
> factor(town)7:time(year)   0.0410007  0.0055734  7.3565 5.576e-11 ***
> factor(town)8:time(year)   0.0239085  0.0053751  4.4480 2.271e-05 ***
> factor(town)9:time(year)   0.0242342  0.0056855  4.2625 4.619e-05 ***
> factor(town)10:time(year)  0.0105890  0.0053302  1.9866  0.049733 *
> factor(town)11:time(year)  0.0095270  0.0056354  1.6906  0.094065 .
> ---
> Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1
> 
> Total Sum of Squares:0.46388
> Residual Sum of Squares: 0.2001
> R-Squared:  0.56863
> Adj. R-Squared: 0.4292
> F-statistic: 11.8637 on 11 and 99 DF, p-value: 7.3065e-14
> 
> 
> 
> Am 11.05.2017 um 23:17 schrieb David L Carlson:
>> What Rui said, but as important, you have four columns in your data called 
>> "town", "year", "revenue", and "supply". You do not have a column called 
>> "time".
>> 
>> -
>> David L Carlson
>> Department of Anthropology
>> Texas A University
>> College Station, TX 77840-4352
>> 
>> -Original Message-
>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas
>> Sent: Thursday, May 11, 2017 3:36 PM
>> To: Tobias Christoph ; Duncan Murdoch 
>> ; r-help@r-project.org
>> Subject: Re: [R] Error: invalid type (closure) for the variable 'time' - 
>> object specific trend
>> 
>> Hello,
>> 
>> A closure is, like you say, a function.
>> At an R prompt try:
>> 
>>> typeof(time)
>> [1] "closure"
>> 
>> So like Duncan suggested rename 'time', for instance capitalize it
>> 'Time'. That should do it.
>> 
>> Hope this helps,
>> 
>> Rui Barradas
>> 
>> 
>> 
>> Em 11-05-2017 21:20, Tobias Christoph escreveu:
>>> Hey Duncan,
>>> 
>>> thank you very much for your quick reply.
>>> 
>>> _My data used:_
>>> 
>>> 1st column(town):1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,...,11
>>> 
>>> 2nd column(year):1,2,3,4,5,6,7,8,9,10,11,12,1,2,3...,12
>>> 
>>> 3rd column (revenue):
>>> 
>>> 4th colum (supply):
>>> 
>>> I have now renamed my colums and did the regression again. Now there is
>>> a problem with R-squared, as it is the sum of 1 now with no given std.
>>> error and t-value. This is probably due to the fact, that I try to
>>> estimate more parameters than data.
>>> 
>>> To add a linear trend I found the following formular:*lm(y ~ x1 +
>>> factor(ccode)*time, data=df)*
>>> 
>>> I try to I decode it for and use it for my regression: *plm(log(revenue)
>>> ~ log(supply) + factor(town)*time, data=R_Test_log_Neu)*
>>> 
>>> When I do this regression I will get the original error: "invalid type
>>> (closure) for the variable 'time' - object specific trend"
>>> 
>>> With the notation"time" not my colum is meant, but probably the command
>>> "time" in R.
>>> 
>>> Can you follow my thoughts?
>>> 
>>> Tobi
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Am 11.05.2017 um 17:23 schrieb Duncan Murdoch:
 Duncan Murdoch
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, 

Re: [R] Error: invalid type (closure) for the variable 'time' - object specific trend

2017-05-12 Thread David Winsemius

> On May 12, 2017, at 7:40 AM, Tobias Christoph  
> wrote:
> 
> Hey guys,
> 
> thanks a lot for your tips. The regression is finally running. As you 
> said, I had to integrate the column "year" in the function "time" in R.
> 
> So I used the following formula: *plm(log(revenue) ~ log(supply) + 
> factor(town)*time(year), data=R_Test_log_Neu)*
> 
> So I have now sucessfully added a linear trend to my regression model? 
> Another question that concernes me is how to add a quadratic trend 
> instead of a linear trend. Can I just square the column "year"?

It's difficult to respond to these questions. It appears you have either 
created a function named `time` or loaded a package that contains such a named 
function. Several of the origianl responders thought it might be a misspelling 
of an existing column name.

One might guess from the output that `time` represents a linear value from a 
factor-variable across the values of the "year" column. You should probably NOT 
"just square column 'year'". That will probably construct non-orthogonal 
dependencies between "time" and "time"^2. The usual method in ordinary linear 
regression  is to use the "poly" function. In your case however the puzzle 
about what that `time` function looks like prevents much further comment.

To support informed discussion on this matter you MUST provide:

--- code that includes all the needed library() calls to load packages or to 
build a time function.
--- str(R_Test_log_Neu)


-- 
David
> 
> Enjoy your weekend,
> 
> Toby
> 
> _My results see below:_
> 
> Balanced Panel: n=11, T=12, N=132
> 
> Residuals :
> Min.  1st Qu.   Median  3rd Qu. Max.
> -0.09610 -0.02370 -0.00152  0.01980  0.14000
> 
> Coefficients :
> Estimate Std. Error t-value  Pr(>|t|)
> log(supply)   -0.0080702  0.0133675 -0.6037  0.547411
> factor(town)2:time(year)  -0.0063245  0.0053744 -1.1768  0.242100
> factor(town)3:time(year)   0.0295522  0.0056776  5.2050 1.053e-06 ***
> factor(town)4:time(year)   0.0062191  0.0054152  1.1485  0.253549
> factor(town)5:time(year)   0.0159028  0.0054954  2.8939  0.004681 **
> factor(town)6:time(year)   0.0237112  0.0055395  4.2804 4.316e-05 ***
> factor(town)7:time(year)   0.0410007  0.0055734  7.3565 5.576e-11 ***
> factor(town)8:time(year)   0.0239085  0.0053751  4.4480 2.271e-05 ***
> factor(town)9:time(year)   0.0242342  0.0056855  4.2625 4.619e-05 ***
> factor(town)10:time(year)  0.0105890  0.0053302  1.9866  0.049733 *
> factor(town)11:time(year)  0.0095270  0.0056354  1.6906  0.094065 .
> ---
> Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1
> 
> Total Sum of Squares:0.46388
> Residual Sum of Squares: 0.2001
> R-Squared:  0.56863
> Adj. R-Squared: 0.4292
> F-statistic: 11.8637 on 11 and 99 DF, p-value: 7.3065e-14
> 
> 
> 
> Am 11.05.2017 um 23:17 schrieb David L Carlson:
>> What Rui said, but as important, you have four columns in your data called 
>> "town", "year", "revenue", and "supply". You do not have a column called 
>> "time".
>> 
>> -
>> David L Carlson
>> Department of Anthropology
>> Texas A University
>> College Station, TX 77840-4352
>> 
>> -Original Message-
>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas
>> Sent: Thursday, May 11, 2017 3:36 PM
>> To: Tobias Christoph ; Duncan Murdoch 
>> ; r-help@r-project.org
>> Subject: Re: [R] Error: invalid type (closure) for the variable 'time' - 
>> object specific trend
>> 
>> Hello,
>> 
>> A closure is, like you say, a function.
>> At an R prompt try:
>> 
>>> typeof(time)
>> [1] "closure"
>> 
>> So like Duncan suggested rename 'time', for instance capitalize it
>> 'Time'. That should do it.
>> 
>> Hope this helps,
>> 
>> Rui Barradas
>> 
>> 
>> 
>> Em 11-05-2017 21:20, Tobias Christoph escreveu:
>>> Hey Duncan,
>>> 
>>> thank you very much for your quick reply.
>>> 
>>> _My data used:_
>>> 
>>> 1st column(town):1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,...,11
>>> 
>>> 2nd column(year):1,2,3,4,5,6,7,8,9,10,11,12,1,2,3...,12
>>> 
>>> 3rd column (revenue):
>>> 
>>> 4th colum (supply):
>>> 
>>> I have now renamed my colums and did the regression again. Now there is
>>> a problem with R-squared, as it is the sum of 1 now with no given std.
>>> error and t-value. This is probably due to the fact, that I try to
>>> estimate more parameters than data.
>>> 
>>> To add a linear trend I found the following formular:*lm(y ~ x1 +
>>> factor(ccode)*time, data=df)*
>>> 
>>> I try to I decode it for and use it for my regression: *plm(log(revenue)
>>> ~ log(supply) + factor(town)*time, data=R_Test_log_Neu)*
>>> 
>>> When I do this regression I will get the original error: "invalid type
>>> (closure) for the variable 'time' - object specific trend"
>>> 
>>> With the notation"time" not my colum is meant, but probably the command
>>> "time" in R.
>>> 
>>> Can 

Re: [R] Matched Items in rows + issue with writing a table

2017-05-12 Thread Jeff Newmiller
Jim is generous enough that he might do this, but such assistance is not 
sustainable. Fortunately, you can type a ? in front of the name of a function 
and read about what goes in and what comes out. You can also type expressions 
like x[2] or which(matches) right before you execute the line of code, and you 
can use debug(findMatches) to cause R to let you step through the function one 
line at a time. These are all skills you should start developing soon, because 
in the long run they will teach you more than you can learn by asking questions 
here. 
-- 
Sent from my phone. Please excuse my brevity.

On May 12, 2017 11:08:50 AM PDT, abo dalash  wrote:
>Dear Jim..,
>
>
>Many thanks for your answer.
>
>
>As I'm a new R user, could you please provide a short explanation
>
>about what each line of the following does ?
>
>
>findMatches<-function(x,sep=",") {
> matchval<-NA
> x1bits<-unlist(strsplit(x[1],sep))
> x2bits<-unlist(strsplit(x[2],sep))
> matches<-x1bits %in% x2bits
> if(any(matches)) matchval<-x1bits[which(matches)]
> return(matchval)
>}
>x444$matched_items<-apply(x444,1,findMatches)
>
>
>I would like to understand so I can apply the same for any further
>analysis
>
>I may need in the future.
>
>
>Regards
>
>
>
>
>From: Jim Lemon 
>Sent: 12 May 2017 04:14 AM
>To: abo dalash
>Cc: r-help@R-project.org
>Subject: Re: [R] Matched Items in rows + issue with writing a table
>
>Hi abo,
>I think you want to split your strings and do your matching like this:
>
>x444<-read.table(text="w r
> cyp3,cyp7 cyp2,cyp1,cyp3
> cyp2 cyp2
> c1,c3,c6 c6,c8,c5",
> header=TRUE,stringsAsFactors=FALSE)
>findMatches<-function(x,sep=",") {
> matchval<-NA
> x1bits<-unlist(strsplit(x[1],sep))
> x2bits<-unlist(strsplit(x[2],sep))
> matches<-x1bits %in% x2bits
> if(any(matches)) matchval<-x1bits[which(matches)]
> return(matchval)
>}
>x444$matched_items<-apply(x444,1,findMatches)
>
>Note that this will only work with character values, _not_ factors.
>
>Jim
>
>On Fri, May 12, 2017 at 9:16 AM, abo dalash 
>wrote:
>> Hi All ..,
>>
>>
>> I have a table called "x444" and I would like to create a new column
>contains the matched items in each row between column w & r . I used
>match()function as below but this does not return the results I want
>because of 2 issues. The 1st one is that this gives the row number of
>shared items while I want to see the item itself (e.g. in the table
>below, I want to see cyp2 instead of the row number 2). The 2nd issue
>is that I need to know matched items considering every item in the row
>instead of the entire row. For example, the item cyp3 is a matched item
>in the first row between columns w & r. The same applies for c6 in row
>3. These don't appear in the results below.
>>
>>
>>
>>>x444
>>w r
>> 1 cyp3,cyp7 cyp2, cyp1,cyp3
>> 2 cyp2  cyp2
>> 3   c1,c3,c6   c6,c8,c5
>>
>>
>>> r = c(match(x444$w,X444$r))
>>> r
>> [1] NA  2 NA
>>
>>
>>
>> The desired output should be like this :-
>>
>> w r matched
>items
>> 1 cyp3,cyp7 cyp2, cyp1,cyp3 cyp3
>> 2 cyp2  cyp2  cyp2
>> 3   c1,c3,c6   c6,c8,c5  c6
>>
>>
>> The second issue is that when I write a table produced in R as
>follows :
>>
>> write.table(MyTable,file="MyTable.txt", sep = "\t", quote = F,
>row.names = F)
>>
>> and the read this txt. file in excel, some items from column B
>appears in Column A and some empty rows also appear?.
>>
>> Could you please guide me about the mistakes I have done and suggest
>> some solutions?
>>
>> Regards
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>
>
>thz.ch/mailman/listinfo/r-help>
>stat.ethz.ch
>The main R mailing list, for announcements about the development of R
>and the availability of new code, questions and answers about problems
>and solutions using R ...
>
>
>
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply and cor()

2017-05-12 Thread Micha Silver



On 05/12/2017 06:53 PM, David L Carlson wrote:

Actually, not using apply() would be faster and simpler

cor(t(compare_data), t(test_data))

Worked just as I wanted, thanks!
I first reshaped the data frames, then I didn't even need the t()


David C

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson
Sent: Friday, May 12, 2017 10:48 AM
To: Ismail SEZEN ; Micha Silver 
Cc: R-help@r-project.org
Subject: Re: [R] apply and cor()

Actually, r is a vector, not an index value. You need

apply(compare_data, 1, function(r) cor(r, t(test_data)))

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ismail SEZEN
Sent: Friday, May 12, 2017 10:11 AM
To: Micha Silver 
Cc: R-help@r-project.org
Subject: Re: [R] apply and cor()



On 12 May 2017, at 17:57, Micha Silver  wrote:

I have two data.frames, one with a single row of 31 columns, and the second 
with 269 rows and the same 31 columns.

dim(compare_data)

[1] 269  31

dim(test_data)

[1]  1 31

I want to apply cor() between the one row of 'test_data', and each row of the 
'compare_data' .
I tried 'apply' but I get this error:

apply(compare_data, 1, function(r) {cor(compare_data[r,], test_data)})

Error in cor(compare_data[r, ], test_data) : incompatible dimensions


apply(compare_data, 1, function(r) {cor(compare_data[r,], 
as.numeric(test_data))})

See ?cor. Explanation of y is "NULL (default) or a vector, matrix or data frame 
with compatible dimensions to x”.



In order to try to understand I did:

dims <- apply(compare_data, 1, function(r) {dim(compare_data[r,])})
head(dims)

 20 73 103 118 130 142 151 154 191 205 217 222 227 232 240 275 282 301 320 
359 360 551 589 653 789 801 808 812
[1,]  9  8   8   9   5   9   8  11   6  15  12  13  10   7   9  14 8  11   9  
11  11  12   9  14   5   8   9  10
[2,] 31 31  31  31  31  31  31  31  31  31  31  31  31  31  31  31 31  31  31  
31  31  31  31  31  31  31  31  31
 840 856 857 867 885 970 983 985 1103 1107 1197 1207 1237 1262 1279 1282 
1332 1357 1358 1392 1411 1435 1458 1473
[1,]  14  11  12   8  10   2   7   9   108   10   13   11 79   12   11  
 11   16   10   10   12   10   10
[2,]  31  31  31  31  31  31  31  31   31   31   31   31   31   31 31   31   31 
  31   31   31   31   31   31   31

and indeed I am getting different row dimensions. I expected "1 31" for each. 
What are the values 9,8,8,9,5... in the [1,] dimension?

If I test the compare_data data.frame one row at a time:

dim(compare_data['20',])

[1]  1 31

dim(compare_data['1473',])

[1]  1 31

It looks as I expected.
What am I missing??

Thanks

--
Micha Silver
cell: +972-523-665918

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Micha Silver
cell: +972-523-665918

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matched Items in rows + issue with writing a table

2017-05-12 Thread abo dalash
Dear Jim..,


Many thanks for your answer.


As I'm a new R user, could you please provide a short explanation

about what each line of the following does ?


findMatches<-function(x,sep=",") {
 matchval<-NA
 x1bits<-unlist(strsplit(x[1],sep))
 x2bits<-unlist(strsplit(x[2],sep))
 matches<-x1bits %in% x2bits
 if(any(matches)) matchval<-x1bits[which(matches)]
 return(matchval)
}
x444$matched_items<-apply(x444,1,findMatches)


I would like to understand so I can apply the same for any further analysis

I may need in the future.


Regards




From: Jim Lemon 
Sent: 12 May 2017 04:14 AM
To: abo dalash
Cc: r-help@R-project.org
Subject: Re: [R] Matched Items in rows + issue with writing a table

Hi abo,
I think you want to split your strings and do your matching like this:

x444<-read.table(text="w r
 cyp3,cyp7 cyp2,cyp1,cyp3
 cyp2 cyp2
 c1,c3,c6 c6,c8,c5",
 header=TRUE,stringsAsFactors=FALSE)
findMatches<-function(x,sep=",") {
 matchval<-NA
 x1bits<-unlist(strsplit(x[1],sep))
 x2bits<-unlist(strsplit(x[2],sep))
 matches<-x1bits %in% x2bits
 if(any(matches)) matchval<-x1bits[which(matches)]
 return(matchval)
}
x444$matched_items<-apply(x444,1,findMatches)

Note that this will only work with character values, _not_ factors.

Jim

On Fri, May 12, 2017 at 9:16 AM, abo dalash  wrote:
> Hi All ..,
>
>
> I have a table called "x444" and I would like to create a new column contains 
> the matched items in each row between column w & r . I used match()function 
> as below but this does not return the results I want because of 2 issues. The 
> 1st one is that this gives the row number of shared items while I want to see 
> the item itself (e.g. in the table below, I want to see cyp2 instead of the 
> row number 2). The 2nd issue is that I need to know matched items considering 
> every item in the row instead of the entire row. For example, the item cyp3 
> is a matched item in the first row between columns w & r. The same applies 
> for c6 in row 3. These don't appear in the results below.
>
>
>
>>x444
>w r
> 1 cyp3,cyp7 cyp2, cyp1,cyp3
> 2 cyp2  cyp2
> 3   c1,c3,c6   c6,c8,c5
>
>
>> r = c(match(x444$w,X444$r))
>> r
> [1] NA  2 NA
>
>
>
> The desired output should be like this :-
>
> w r matched items
> 1 cyp3,cyp7 cyp2, cyp1,cyp3 cyp3
> 2 cyp2  cyp2  cyp2
> 3   c1,c3,c6   c6,c8,c5  c6
>
>
> The second issue is that when I write a table produced in R as follows :
>
> write.table(MyTable,file="MyTable.txt", sep = "\t", quote = F, row.names = F)
>
> and the read this txt. file in excel, some items from column B appears in 
> Column A and some empty rows also appear?.
>
> Could you please guide me about the mistakes I have done and suggest
> some solutions?
>
> Regards
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help


thz.ch/mailman/listinfo/r-help>
stat.ethz.ch
The main R mailing list, for announcements about the development of R and the 
availability of new code, questions and answers about problems and solutions 
using R ...



> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Drawing World map divided into 6 economic regions

2017-05-12 Thread MacQueen, Don
Well, you'll have to find the boundaries in some electronic GIS format such as 
a shapefile (though there are other options).

I don't know where to find such a thing. Your chances of finding someone who 
does know are greater on the R-sig-geo mailing list, so I'd suggest asking 
there.

You may also need to transform your boundaries from whatever coordinate 
reference system they come in, to your desired projection.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062


On 5/12/17, 10:13 AM, "R-help on behalf of Christofer Bogaso" 
 wrote:

Hi again,

I am trying to draw a World map which is divided into 6 Economic
regions as available in below link

http://www.worldbank.org/en/about/annual-report/regions

I am aware of various R ways to draw World map based on Countries like
one available in


http://stackoverflow.com/questions/24136868/plot-map-with-values-for-countries-as-color-in-r

However I do not want to put any individual Country boundaries,
instead the individual boundaries of 6 Economic regions in the World.

Can you please suggest how can I achieve such a World map. Any pointer
will be highly appreciated.

Thanks for your time.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: invalid type (closure) for the variable 'time' - object specific trend

2017-05-12 Thread Rui Barradas

Hello,

I have never used plm but the standard way of adding a quadratic term is

I(time(year)^2)

Hope this helps,

Rui Barradas

Em 12-05-2017 15:40, Tobias Christoph escreveu:

Hey guys,

thanks a lot for your tips. The regression is finally running. As you
said, I had to integrate the column "year" in the function "time" in R.

So I used the following formula: *plm(log(revenue) ~ log(supply) +
factor(town)*time(year), data=R_Test_log_Neu)*

So I have now sucessfully added a linear trend to my regression model?
Another question that concernes me is how to add a quadratic trend
instead of a linear trend. Can I just square the column "year"?

Enjoy your weekend,

Toby

_My results see below:_

Balanced Panel: n=11, T=12, N=132

Residuals :
 Min.  1st Qu.   Median  3rd Qu. Max.
-0.09610 -0.02370 -0.00152  0.01980  0.14000

Coefficients :
 Estimate Std. Error t-value  Pr(>|t|)
log(supply)   -0.0080702  0.0133675 -0.6037  0.547411
factor(town)2:time(year)  -0.0063245  0.0053744 -1.1768  0.242100
factor(town)3:time(year)   0.0295522  0.0056776  5.2050 1.053e-06 ***
factor(town)4:time(year)   0.0062191  0.0054152  1.1485  0.253549
factor(town)5:time(year)   0.0159028  0.0054954  2.8939  0.004681 **
factor(town)6:time(year)   0.0237112  0.0055395  4.2804 4.316e-05 ***
factor(town)7:time(year)   0.0410007  0.0055734  7.3565 5.576e-11 ***
factor(town)8:time(year)   0.0239085  0.0053751  4.4480 2.271e-05 ***
factor(town)9:time(year)   0.0242342  0.0056855  4.2625 4.619e-05 ***
factor(town)10:time(year)  0.0105890  0.0053302  1.9866  0.049733 *
factor(town)11:time(year)  0.0095270  0.0056354  1.6906  0.094065 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:0.46388
Residual Sum of Squares: 0.2001
R-Squared:  0.56863
Adj. R-Squared: 0.4292
F-statistic: 11.8637 on 11 and 99 DF, p-value: 7.3065e-14



Am 11.05.2017 um 23:17 schrieb David L Carlson:

What Rui said, but as important, you have four columns in your data called "town", "year", 
"revenue", and "supply". You do not have a column called "time".

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas
Sent: Thursday, May 11, 2017 3:36 PM
To: Tobias Christoph; Duncan 
Murdoch;r-help@r-project.org
Subject: Re: [R] Error: invalid type (closure) for the variable 'time' - object 
specific trend

Hello,

A closure is, like you say, a function.
At an R prompt try:

  > typeof(time)
[1] "closure"

So like Duncan suggested rename 'time', for instance capitalize it
'Time'. That should do it.

Hope this helps,

Rui Barradas



Em 11-05-2017 21:20, Tobias Christoph escreveu:

Hey Duncan,

thank you very much for your quick reply.

_My data used:_

1st column(town):1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,...,11

2nd column(year):1,2,3,4,5,6,7,8,9,10,11,12,1,2,3...,12

3rd column (revenue):

4th colum (supply):

I have now renamed my colums and did the regression again. Now there is
a problem with R-squared, as it is the sum of 1 now with no given std.
error and t-value. This is probably due to the fact, that I try to
estimate more parameters than data.

To add a linear trend I found the following formular:*lm(y ~ x1 +
factor(ccode)*time, data=df)*

I try to I decode it for and use it for my regression: *plm(log(revenue)
~ log(supply) + factor(town)*time, data=R_Test_log_Neu)*

When I do this regression I will get the original error: "invalid type
(closure) for the variable 'time' - object specific trend"

With the notation"time" not my colum is meant, but probably the command
"time" in R.

Can you follow my thoughts?

Tobi






Am 11.05.2017 um 17:23 schrieb Duncan Murdoch:

Duncan Murdoch

__
R-help@r-project.org  mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org  mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Drawing World map divided into 6 economic regions

2017-05-12 Thread Christofer Bogaso
Hi again,

I am trying to draw a World map which is divided into 6 Economic
regions as available in below link

http://www.worldbank.org/en/about/annual-report/regions

I am aware of various R ways to draw World map based on Countries like
one available in

http://stackoverflow.com/questions/24136868/plot-map-with-values-for-countries-as-color-in-r

However I do not want to put any individual Country boundaries,
instead the individual boundaries of 6 Economic regions in the World.

Can you please suggest how can I achieve such a World map. Any pointer
will be highly appreciated.

Thanks for your time.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] visualization of KNN results in text classification

2017-05-12 Thread Elahe chalabi via R-help


Thanks for your reply. What I exactly have is a data frame with rows containing 
words which have been used in each speech and columns containing frequency of 
these words, I have an extra row showing the type of the speech whether it was 
from a control group or Alzheimer group. Then I create a training and test set 
for KNN from this data frame and by KNN I classify the speeches which assigns 
every speech (actually text of the speech!) to the correct type of group, if 
it's from control group or Alzheimer group. 
Now my question is how can I visualize my KNN classifier or its results? cause 
now I only have an accuracy matrix from KNN!

Thanks for any help!
Elahe 

 
On Monday, May 8, 2017 3:55 PM, Ismail SEZEN  wrote:



As far as I know, kNN groups by Eucledian distance. So, you need numerical data 
as input. You said your dataset has only “speeches” and “type of people”. Are 
these input? or one of them is input and the latter one is output? Type of 
people should be a factor variable (I guess). I don’t know how you represent 
“speech” in your dataset. As character or numerical representation of a 
feature? If you send a minimal example of the problem, we can help you. Please, 
read posting guide.



> __

> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see

> https://stat.ethz.ch/mailman/listinfo/r-help

> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R-es] html.tabular

2017-05-12 Thread Jose Ramirez Costa
Buen dia! Como estan?

Escribo para consultar si alguien sabe como cambiar el estilo css de una
tabla hecha con el paquete TABLES.
El objeto seria un html.tabular.
Estuve buscando por diferentes lugares y no encuentro un ejemplo practico
ni un texto donde sea claro cual es el parametro q debo cambiar y como.

Muchas gracias

Abrazo!

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] apply and cor()

2017-05-12 Thread David L Carlson
Actually, not using apply() would be faster and simpler

cor(t(compare_data), t(test_data))

David C

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L Carlson
Sent: Friday, May 12, 2017 10:48 AM
To: Ismail SEZEN ; Micha Silver 
Cc: R-help@r-project.org
Subject: Re: [R] apply and cor()

Actually, r is a vector, not an index value. You need

apply(compare_data, 1, function(r) cor(r, t(test_data)))

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ismail SEZEN
Sent: Friday, May 12, 2017 10:11 AM
To: Micha Silver 
Cc: R-help@r-project.org
Subject: Re: [R] apply and cor()


> On 12 May 2017, at 17:57, Micha Silver  wrote:
> 
> I have two data.frames, one with a single row of 31 columns, and the second 
> with 269 rows and the same 31 columns.
> > dim(compare_data)
> [1] 269  31
> > dim(test_data)
> [1]  1 31
> 
> I want to apply cor() between the one row of 'test_data', and each row of the 
> 'compare_data' .
> I tried 'apply' but I get this error:
> > apply(compare_data, 1, function(r) {cor(compare_data[r,], test_data)})
> Error in cor(compare_data[r, ], test_data) : incompatible dimensions


apply(compare_data, 1, function(r) {cor(compare_data[r,], 
as.numeric(test_data))})

See ?cor. Explanation of y is "NULL (default) or a vector, matrix or data frame 
with compatible dimensions to x”.

> 
> 
> In order to try to understand I did:
> > dims <- apply(compare_data, 1, function(r) {dim(compare_data[r,])})
> > head(dims)
> 20 73 103 118 130 142 151 154 191 205 217 222 227 232 240 275 282 301 320 
> 359 360 551 589 653 789 801 808 812
> [1,]  9  8   8   9   5   9   8  11   6  15  12  13  10   7   9  14 8  11   9  
> 11  11  12   9  14   5   8   9  10
> [2,] 31 31  31  31  31  31  31  31  31  31  31  31  31  31  31  31 31  31  31 
>  31  31  31  31  31  31  31  31  31
> 840 856 857 867 885 970 983 985 1103 1107 1197 1207 1237 1262 1279 1282 
> 1332 1357 1358 1392 1411 1435 1458 1473
> [1,]  14  11  12   8  10   2   7   9   108   10   13   11 79   12   
> 11   11   16   10   10   12   10   10
> [2,]  31  31  31  31  31  31  31  31   31   31   31   31   31   31 31   31   
> 31   31   31   31   31   31   31   31
> 
> and indeed I am getting different row dimensions. I expected "1 31" for each. 
> What are the values 9,8,8,9,5... in the [1,] dimension?
> 
> If I test the compare_data data.frame one row at a time:
> > dim(compare_data['20',])
> [1]  1 31
> > dim(compare_data['1473',])
> [1]  1 31
> 
> It looks as I expected.
> What am I missing??
> 
> Thanks
> 
> -- 
> Micha Silver
> cell: +972-523-665918
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply and cor()

2017-05-12 Thread David L Carlson
Actually, r is a vector, not an index value. You need

apply(compare_data, 1, function(r) cor(r, t(test_data)))

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ismail SEZEN
Sent: Friday, May 12, 2017 10:11 AM
To: Micha Silver 
Cc: R-help@r-project.org
Subject: Re: [R] apply and cor()


> On 12 May 2017, at 17:57, Micha Silver  wrote:
> 
> I have two data.frames, one with a single row of 31 columns, and the second 
> with 269 rows and the same 31 columns.
> > dim(compare_data)
> [1] 269  31
> > dim(test_data)
> [1]  1 31
> 
> I want to apply cor() between the one row of 'test_data', and each row of the 
> 'compare_data' .
> I tried 'apply' but I get this error:
> > apply(compare_data, 1, function(r) {cor(compare_data[r,], test_data)})
> Error in cor(compare_data[r, ], test_data) : incompatible dimensions


apply(compare_data, 1, function(r) {cor(compare_data[r,], 
as.numeric(test_data))})

See ?cor. Explanation of y is "NULL (default) or a vector, matrix or data frame 
with compatible dimensions to x”.

> 
> 
> In order to try to understand I did:
> > dims <- apply(compare_data, 1, function(r) {dim(compare_data[r,])})
> > head(dims)
> 20 73 103 118 130 142 151 154 191 205 217 222 227 232 240 275 282 301 320 
> 359 360 551 589 653 789 801 808 812
> [1,]  9  8   8   9   5   9   8  11   6  15  12  13  10   7   9  14 8  11   9  
> 11  11  12   9  14   5   8   9  10
> [2,] 31 31  31  31  31  31  31  31  31  31  31  31  31  31  31  31 31  31  31 
>  31  31  31  31  31  31  31  31  31
> 840 856 857 867 885 970 983 985 1103 1107 1197 1207 1237 1262 1279 1282 
> 1332 1357 1358 1392 1411 1435 1458 1473
> [1,]  14  11  12   8  10   2   7   9   108   10   13   11 79   12   
> 11   11   16   10   10   12   10   10
> [2,]  31  31  31  31  31  31  31  31   31   31   31   31   31   31 31   31   
> 31   31   31   31   31   31   31   31
> 
> and indeed I am getting different row dimensions. I expected "1 31" for each. 
> What are the values 9,8,8,9,5... in the [1,] dimension?
> 
> If I test the compare_data data.frame one row at a time:
> > dim(compare_data['20',])
> [1]  1 31
> > dim(compare_data['1473',])
> [1]  1 31
> 
> It looks as I expected.
> What am I missing??
> 
> Thanks
> 
> -- 
> Micha Silver
> cell: +972-523-665918
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply and cor()

2017-05-12 Thread Ismail SEZEN

> On 12 May 2017, at 17:57, Micha Silver  wrote:
> 
> I have two data.frames, one with a single row of 31 columns, and the second 
> with 269 rows and the same 31 columns.
> > dim(compare_data)
> [1] 269  31
> > dim(test_data)
> [1]  1 31
> 
> I want to apply cor() between the one row of 'test_data', and each row of the 
> 'compare_data' .
> I tried 'apply' but I get this error:
> > apply(compare_data, 1, function(r) {cor(compare_data[r,], test_data)})
> Error in cor(compare_data[r, ], test_data) : incompatible dimensions


apply(compare_data, 1, function(r) {cor(compare_data[r,], 
as.numeric(test_data))})

See ?cor. Explanation of y is "NULL (default) or a vector, matrix or data frame 
with compatible dimensions to x”.

> 
> 
> In order to try to understand I did:
> > dims <- apply(compare_data, 1, function(r) {dim(compare_data[r,])})
> > head(dims)
> 20 73 103 118 130 142 151 154 191 205 217 222 227 232 240 275 282 301 320 
> 359 360 551 589 653 789 801 808 812
> [1,]  9  8   8   9   5   9   8  11   6  15  12  13  10   7   9  14 8  11   9  
> 11  11  12   9  14   5   8   9  10
> [2,] 31 31  31  31  31  31  31  31  31  31  31  31  31  31  31  31 31  31  31 
>  31  31  31  31  31  31  31  31  31
> 840 856 857 867 885 970 983 985 1103 1107 1197 1207 1237 1262 1279 1282 
> 1332 1357 1358 1392 1411 1435 1458 1473
> [1,]  14  11  12   8  10   2   7   9   108   10   13   11 79   12   
> 11   11   16   10   10   12   10   10
> [2,]  31  31  31  31  31  31  31  31   31   31   31   31   31   31 31   31   
> 31   31   31   31   31   31   31   31
> 
> and indeed I am getting different row dimensions. I expected "1 31" for each. 
> What are the values 9,8,8,9,5... in the [1,] dimension?
> 
> If I test the compare_data data.frame one row at a time:
> > dim(compare_data['20',])
> [1]  1 31
> > dim(compare_data['1473',])
> [1]  1 31
> 
> It looks as I expected.
> What am I missing??
> 
> Thanks
> 
> -- 
> Micha Silver
> cell: +972-523-665918
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] apply and cor()

2017-05-12 Thread Micha Silver
I have two data.frames, one with a single row of 31 columns, and the 
second with 269 rows and the same 31 columns.

> dim(compare_data)
[1] 269  31
> dim(test_data)
[1]  1 31

I want to apply cor() between the one row of 'test_data', and each row 
of the 'compare_data' .

I tried 'apply' but I get this error:
> apply(compare_data, 1, function(r) {cor(compare_data[r,], test_data)})
 Error in cor(compare_data[r, ], test_data) : incompatible dimensions


In order to try to understand I did:
> dims <- apply(compare_data, 1, function(r) {dim(compare_data[r,])})
> head(dims)
 20 73 103 118 130 142 151 154 191 205 217 222 227 232 240 275 282 
301 320 359 360 551 589 653 789 801 808 812
[1,]  9  8   8   9   5   9   8  11   6  15  12  13  10   7   9  14 8  
11   9  11  11  12   9  14   5   8   9  10
[2,] 31 31  31  31  31  31  31  31  31  31  31  31  31  31  31  31 31  
31  31  31  31  31  31  31  31  31  31  31
 840 856 857 867 885 970 983 985 1103 1107 1197 1207 1237 1262 1279 
1282 1332 1357 1358 1392 1411 1435 1458 1473
[1,]  14  11  12   8  10   2   7   9   108   10   13   11 79   
12   11   11   16   10   10   12   10   10
[2,]  31  31  31  31  31  31  31  31   31   31   31   31   31   31 31   
31   31   31   31   31   31   31   31   31


and indeed I am getting different row dimensions. I expected "1 31" for 
each. What are the values 9,8,8,9,5... in the [1,] dimension?


If I test the compare_data data.frame one row at a time:
> dim(compare_data['20',])
[1]  1 31
> dim(compare_data['1473',])
[1]  1 31

It looks as I expected.
What am I missing??

Thanks

--
Micha Silver
cell: +972-523-665918

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: invalid type (closure) for the variable 'time' - object specific trend

2017-05-12 Thread Tobias Christoph
Hey guys,

thanks a lot for your tips. The regression is finally running. As you 
said, I had to integrate the column "year" in the function "time" in R.

So I used the following formula: *plm(log(revenue) ~ log(supply) + 
factor(town)*time(year), data=R_Test_log_Neu)*

So I have now sucessfully added a linear trend to my regression model? 
Another question that concernes me is how to add a quadratic trend 
instead of a linear trend. Can I just square the column "year"?

Enjoy your weekend,

Toby

_My results see below:_

Balanced Panel: n=11, T=12, N=132

Residuals :
 Min.  1st Qu.   Median  3rd Qu. Max.
-0.09610 -0.02370 -0.00152  0.01980  0.14000

Coefficients :
 Estimate Std. Error t-value  Pr(>|t|)
log(supply)   -0.0080702  0.0133675 -0.6037  0.547411
factor(town)2:time(year)  -0.0063245  0.0053744 -1.1768  0.242100
factor(town)3:time(year)   0.0295522  0.0056776  5.2050 1.053e-06 ***
factor(town)4:time(year)   0.0062191  0.0054152  1.1485  0.253549
factor(town)5:time(year)   0.0159028  0.0054954  2.8939  0.004681 **
factor(town)6:time(year)   0.0237112  0.0055395  4.2804 4.316e-05 ***
factor(town)7:time(year)   0.0410007  0.0055734  7.3565 5.576e-11 ***
factor(town)8:time(year)   0.0239085  0.0053751  4.4480 2.271e-05 ***
factor(town)9:time(year)   0.0242342  0.0056855  4.2625 4.619e-05 ***
factor(town)10:time(year)  0.0105890  0.0053302  1.9866  0.049733 *
factor(town)11:time(year)  0.0095270  0.0056354  1.6906  0.094065 .
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

Total Sum of Squares:0.46388
Residual Sum of Squares: 0.2001
R-Squared:  0.56863
Adj. R-Squared: 0.4292
F-statistic: 11.8637 on 11 and 99 DF, p-value: 7.3065e-14



Am 11.05.2017 um 23:17 schrieb David L Carlson:
> What Rui said, but as important, you have four columns in your data called 
> "town", "year", "revenue", and "supply". You do not have a column called 
> "time".
>
> -
> David L Carlson
> Department of Anthropology
> Texas A University
> College Station, TX 77840-4352
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas
> Sent: Thursday, May 11, 2017 3:36 PM
> To: Tobias Christoph ; Duncan Murdoch 
> ; r-help@r-project.org
> Subject: Re: [R] Error: invalid type (closure) for the variable 'time' - 
> object specific trend
>
> Hello,
>
> A closure is, like you say, a function.
> At an R prompt try:
>
>   > typeof(time)
> [1] "closure"
>
> So like Duncan suggested rename 'time', for instance capitalize it
> 'Time'. That should do it.
>
> Hope this helps,
>
> Rui Barradas
>
>
>
> Em 11-05-2017 21:20, Tobias Christoph escreveu:
>> Hey Duncan,
>>
>> thank you very much for your quick reply.
>>
>> _My data used:_
>>
>> 1st column(town):1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,...,11
>>
>> 2nd column(year):1,2,3,4,5,6,7,8,9,10,11,12,1,2,3...,12
>>
>> 3rd column (revenue):
>>
>> 4th colum (supply):
>>
>> I have now renamed my colums and did the regression again. Now there is
>> a problem with R-squared, as it is the sum of 1 now with no given std.
>> error and t-value. This is probably due to the fact, that I try to
>> estimate more parameters than data.
>>
>> To add a linear trend I found the following formular:*lm(y ~ x1 +
>> factor(ccode)*time, data=df)*
>>
>> I try to I decode it for and use it for my regression: *plm(log(revenue)
>> ~ log(supply) + factor(town)*time, data=R_Test_log_Neu)*
>>
>> When I do this regression I will get the original error: "invalid type
>> (closure) for the variable 'time' - object specific trend"
>>
>> With the notation"time" not my colum is meant, but probably the command
>> "time" in R.
>>
>> Can you follow my thoughts?
>>
>> Tobi
>>
>>
>>
>>
>>
>>
>> Am 11.05.2017 um 17:23 schrieb Duncan Murdoch:
>>> Duncan Murdoch
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to plot a legend centered only on the x axis

2017-05-12 Thread Antonio Silva
Thanks a lot Duncan, Jim and Marc!

2017-05-11 22:59 GMT-03:00 Marc Schwartz :

>
> Bingo.
>
> The 'inset' argument is what I was missing. That allows this to be done
> with one step, rather than the two that I had.
>
> Thanks Duncan.
>
> Mar

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] visualization of KNN results in text classification

2017-05-12 Thread Ismail SEZEN

> On 12 May 2017, at 15:30, Elahe chalabi  wrote:
> 
> 
> 
> Thanks for your reply. What I exactly have is a data frame with rows 
> containing words which have been used in each speech and columns containing 
> frequency of these words, I have an extra row showing the type of the speech 
> whether it was from a control group or Alzheimer group. Then I create a 
> training and test set for KNN from this data frame and by KNN I classify the 
> speeches which assigns every speech (actually text of the speech!) to the 
> correct type of group, if it's from control group or Alzheimer group. 
> Now my question is how can I visualize my KNN classifier or its results? 
> cause now I only have an accuracy matrix from KNN!
> 
> Thanks for any help!
> Elahe 


It would be very helpful if you create a minimal example to understand your 
data and what you have done with. Yes, you explained your data by your words 
but it’s still unclear. So, I created a minimal example instead of you.

For simplicity, I have a data.frame with 3 columns. First 2 are numeric and 
last one is factor. Group column is my real classes. A and B columns are some 
kind a numeric representation of these classes. Let’s call them features. 
Because they have hidden information represent a class. I use 30% of data for 
training and 70% for test. 

This is the point you asked for. After classification, I have a 
test.guess.cluster (factor) variable and it contains predicted clusters by knn 
method (you said that accuracy matrix from KNN, I don’t know what it is). Now, 
I want to see the clusters on a plot. That’s why, I converted 
“test.guess.cluster” variable to numeric, so I can use it to colorise the 
points on the plot. I plotted points in test.df data.frame (A versus B) and 
coloured them by predicted class.

At the end, I evaluated the overall performance of the knn model. Is it good or 
bad? Please note that you have to choose your own _k_ value and size of 
training dataset by trial and error.


library(class)
library(gmodels)
set.seed(6)
df <- data.frame(A = c(rnorm(30, 0), rnorm(30, 3)),
 B = c(rnorm(30, 0), rnorm(30, 3)),
 Group = factor(c(rep("G1", 30), rep("G2", 30
# use 33% of data for training and 67% is for test
i <- sample(2, nrow(df), replace = TRUE, prob = c(0.67, 0.33))
train.df <- df[i == 2, -3] # do not include last column
train.cl <- df[i == 2, 3] # training result cluters
test.df <- df[i == 1, -3] # test data.frame
test.real.cluster <- df[i == 1, 3] # real clusters for test
# predicted clusters by knn
test.guess.cluster <- knn(train = train.df, test = test.df, cl = train.cl, k = 
3)
# convert them to muneric to colorize points on the plot
test.guess.cluster.num <- as.numeric(test.guess.cluster)
plot(test.df, col = test.guess.cluster.num, pch = test.guess.cluster.num)

# examine the result of CrossTable
# The model identified 2 G1 classes as G2 and 1 G2 class as G1.
# Hence, 3 elements are misclassified. (you can distinguish them on the plot)
gm <- gmodels::CrossTable(test.guess.cluster, test.real.cluster, prop.chisq = 
FALSE)
sum(diag(gm$prop.tbl)) # overall success of the model (34 - 3)/34




> 
> 
> On Monday, May 8, 2017 3:55 PM, Ismail SEZEN  wrote:
> 
> 
> 
> As far as I know, kNN groups by Eucledian distance. So, you need numerical 
> data as input. You said your dataset has only “speeches” and “type of 
> people”. Are these input? or one of them is input and the latter one is 
> output? Type of people should be a factor variable (I guess). I don’t know 
> how you represent “speech” in your dataset. As character or numerical 
> representation of a feature? If you send a minimal example of the problem, we 
> can help you. Please, read posting guide.
> 
> 
> 
>> __
> 
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> 
>> https://stat.ethz.ch/mailman/listinfo/r-help
> 
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> 
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Design matrix for species mixture

2017-05-12 Thread Bert Gunter
Please take this thread elsewhere(e.g. Stats.stackexchange.com) as it is
largely about statistics and is therefore offtopic here (this list is about
R programming).

Cheers,
Bert


On May 12, 2017 1:21 AM, "Margot Neyret"  wrote:

Hi Jim,

Sorry if my question was not clear. I will try to explain again…

I have one response variable Y, let’s say vegetation cover. Then I have my
explanatory variable, let’s call it Crop. In my field I can have either
Maize (m), Bean (b), Pumpkin (p) or mixtures : m+b, m+p, b+p, m+b+p. I also
have a second explanatory variable X (e.g. soil moisture content).

So for now my variable Crop has 7 levels [m, p, b, mp, mb, pb, mpb]. If I
want to compare Y between these crops, I write :
summary(lm(Y~Crop))
TukeyHSD(aov(Y~Crop))

And with X :
summary(lm(Y~Crop*X))
… etc.

Then I want to compare the effect of each individual crop. I can do as you
suggested Jim : 3 variables M, B, P with values 0 or 1 if the crop is
present, 0 otherwise and add interactions.
summary(lm(Y~M+B+P+M*P+…))

But here come my questions. This model seems to have 2 drawbacks.
1. How can I do pairwise comparisons here as I would do with Turkey test ?
How can test the hypothesis, for instance, “Bean provides higher cover than
Maize whatever the mixture is” ?
2. When it comes to interactions with other variables it gets quite
complicated (also note that in real life I have 5 crops, not 3 and other
explanatory variables) :
lm(Y~M*X+ B*X + P*X + M*B + M*P + P*B)

So, isn’t there a way to make it more concise ?

I hope it makes sens.

Thanks,
Margot

> On Friday, May 12, 2017 at 2:53 AM, Jim Lemon  wrote:
> Hi Margot,
> I'm not sure I understand your model, but if I make up some data in
> which the response variable is vegetation cover and the three species
> are:
>
> A - eats one type of plant
> B - eats another type of plant
> C - preys on herbivorous insects
>
> df<-read.table(text="field,propveg,A,B,C
> 1,1,0,0,1
> 2,0.3,1,1,1
> 3,0.6,0,1,1
> 4,0.2,1,1,1
> 5,0.7,1,0,1
> 6,0.8,0,0,0
> 7,0.3,1,0,0
> 8,0.4,0,1,0
> 9,0.1,1,1,0
> 10,0.5,0,1,0
> 11,0.5,1,0,1
> 12,0.1,1,1,0
> 13,0.6,0,1,1
> 14,0,1,1,0",
> sep=",",header=TRUE)
> print(summary(lm(propveg~A+B+C+A:B+A:C+B:C,df)))
>
> Is that something like what you want?
>
> Jim
>
> On Fri, May 12, 2017 at 12:40 AM, Margot Neyret 
wrote:
> > Hello,
> >
> > I have fields with species mixtures (for instance, species a, b, c,
a+b, a+c, b+c), and I look at the effect of each species on a response Y.
More specifically, I would like to compare the effect of individual
species, either alone or in mixture.
> >
> > > Y = rnorm(18,0,1)
> > > mixture= rep(c('a','b', 'c', 'a+b', 'a+c', 'b+c'), each = 3)
> >
> > Thus I create variables A, B and C with :
> > - A = 1 when the mixture contains a (ie mixture = a or a+b or a+c); and
0 otherwise.
> > - Idem for variables C and B.
> >
> > > A = ifelse(mixture %in% c('a', 'a+b', 'a+c'), 1, 0)
> > > B = ifelse(mixture %in% c('b', 'a+b', 'b+c'), 1, 0)
> > > C = ifelse(mixture %in% c('c', 'a+c', 'b+c'), 1, 0)
> >
> > My plan was to build a design matrix from these 3 variables, that would
then allow me to compare the effects of each species.
> >
> > > mm = model.matrix(~A+B+C+0)
> > > summary(lm(Y~mm))
> > Coefficients:
> > Estimate Std. Error t value Pr(>|t|)
> > (Intercept) -0.8301 0.6221 -1.334 0.203
> > mmA 1.1636 0.4819 2.415 0.030 *
> > mmB 0.8452 0.4819 1.754 0.101
> > mmC -0.1005 0.4819 -0.208 0.838
> > ---
> > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 0.8347 on 14 degrees of freedom
> > Multiple R-squared: 0.4181, Adjusted R-squared: 0.2934
> > F-statistic: 3.353 on 3 and 14 DF, p-value: 0.04964
> >
> > My questions :
> > 1. Does this approach make any sense ? I have a feeling I am doing
something strange but I cannot put my finger on it.
> > 1. My ddl are wrong, I should not have an intercept here, or at least
my intercept should be one of my species. Should I just remove one species
form the design matrix ?
> > 2. Is there any way to do post-hoc tests on my species now, as I would
have done with Tukey test or lsmeans ?
> >
> > My objective afterwards is to add other explanatory variables and
interactions in the model.
> >
> > Thanks in advance !
> >
> > M. N.
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] Design matrix for species mixture

2017-05-12 Thread Jim Lemon
Hi Margot,
Very messy, like nature. One way is to do tests with dummy variables
that compare:

beans+anything vs anything without beans
maize+anything vs anything without maize
pumpkin+anything vs anything without pumpkin

Then if you find that the "beans" comparison has the strongest effect,
perhaps because beans are nitrogen fixers, you could do successive
comparisons of the bean +anything mixtures. It really depends upon
what your a priori hypothesis is. You'll need a bit of statistical
power to get reliable, or any, results.

Jim


On Fri, May 12, 2017 at 6:20 PM, Margot Neyret  wrote:
> Hi Jim,
>
> Sorry if my question was not clear. I will try to explain again…
>
> I have one response variable Y, let’s say vegetation cover. Then I have my
> explanatory variable, let’s call it Crop. In my field I can have either
> Maize (m), Bean (b), Pumpkin (p) or mixtures : m+b, m+p, b+p, m+b+p. I also
> have a second explanatory variable X (e.g. soil moisture content).
>
> So for now my variable Crop has 7 levels [m, p, b, mp, mb, pb, mpb]. If I
> want to compare Y between these crops, I write :
> summary(lm(Y~Crop))
> TukeyHSD(aov(Y~Crop))
>
> And with X :
> summary(lm(Y~Crop*X))
> … etc.
>
>
> Then I want to compare the effect of each individual crop. I can do as you
> suggested Jim : 3 variables M, B, P with values 0 or 1 if the crop is
> present, 0 otherwise and add interactions.
> summary(lm(Y~M+B+P+M*P+…))
>
> But here come my questions. This model seems to have 2 drawbacks.
> 1. How can I do pairwise comparisons here as I would do with Turkey test ?
> How can test the hypothesis, for instance, “Bean provides higher cover than
> Maize whatever the mixture is” ?
> 2. When it comes to interactions with other variables it gets quite
> complicated (also note that in real life I have 5 crops, not 3 and other
> explanatory variables) :
> lm(Y~M*X+ B*X + P*X + M*B + M*P + P*B)
>
> So, isn’t there a way to make it more concise ?
>
> I hope it makes sens.
>
> Thanks,
> Margot
>
>
> On Friday, May 12, 2017 at 2:53 AM, Jim Lemon  wrote:
> Hi Margot,
> I'm not sure I understand your model, but if I make up some data in
> which the response variable is vegetation cover and the three species
> are:
>
> A - eats one type of plant
> B - eats another type of plant
> C - preys on herbivorous insects
>
> df<-read.table(text="field,propveg,A,B,C
> 1,1,0,0,1
> 2,0.3,1,1,1
> 3,0.6,0,1,1
> 4,0.2,1,1,1
> 5,0.7,1,0,1
> 6,0.8,0,0,0
> 7,0.3,1,0,0
> 8,0.4,0,1,0
> 9,0.1,1,1,0
> 10,0.5,0,1,0
> 11,0.5,1,0,1
> 12,0.1,1,1,0
> 13,0.6,0,1,1
> 14,0,1,1,0",
> sep=",",header=TRUE)
> print(summary(lm(propveg~A+B+C+A:B+A:C+B:C,df)))
>
> Is that something like what you want?
>
> Jim
>
> On Fri, May 12, 2017 at 12:40 AM, Margot Neyret 
> wrote:
>
> Hello,
>
> I have fields with species mixtures (for instance, species a, b, c, a+b,
> a+c, b+c), and I look at the effect of each species on a response Y. More
> specifically, I would like to compare the effect of individual species,
> either alone or in mixture.
>
> Y = rnorm(18,0,1)
> mixture= rep(c('a','b', 'c', 'a+b', 'a+c', 'b+c'), each = 3)
>
>
> Thus I create variables A, B and C with :
> - A = 1 when the mixture contains a (ie mixture = a or a+b or a+c); and 0
> otherwise.
> - Idem for variables C and B.
>
> A = ifelse(mixture %in% c('a', 'a+b', 'a+c'), 1, 0)
> B = ifelse(mixture %in% c('b', 'a+b', 'b+c'), 1, 0)
> C = ifelse(mixture %in% c('c', 'a+c', 'b+c'), 1, 0)
>
>
> My plan was to build a design matrix from these 3 variables, that would then
> allow me to compare the effects of each species.
>
> mm = model.matrix(~A+B+C+0)
> summary(lm(Y~mm))
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.8301 0.6221 -1.334 0.203
> mmA 1.1636 0.4819 2.415 0.030 *
> mmB 0.8452 0.4819 1.754 0.101
> mmC -0.1005 0.4819 -0.208 0.838
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.8347 on 14 degrees of freedom
> Multiple R-squared: 0.4181, Adjusted R-squared: 0.2934
> F-statistic: 3.353 on 3 and 14 DF, p-value: 0.04964
>
> My questions :
> 1. Does this approach make any sense ? I have a feeling I am doing something
> strange but I cannot put my finger on it.
> 1. My ddl are wrong, I should not have an intercept here, or at least my
> intercept should be one of my species. Should I just remove one species form
> the design matrix ?
> 2. Is there any way to do post-hoc tests on my species now, as I would have
> done with Tukey test or lsmeans ?
>
> My objective afterwards is to add other explanatory variables and
> interactions in the model.
>
> Thanks in advance !
>
> M. N.
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> 

Re: [R] Design matrix for species mixture

2017-05-12 Thread Margot Neyret
Hi Jim,

Sorry if my question was not clear. I will try to explain again…

I have one response variable Y, let’s say vegetation cover. Then I have my 
explanatory variable, let’s call it Crop. In my field I can have either Maize 
(m), Bean (b), Pumpkin (p) or mixtures : m+b, m+p, b+p, m+b+p. I also have a 
second explanatory variable X (e.g. soil moisture content).

So for now my variable Crop has 7 levels [m, p, b, mp, mb, pb, mpb]. If I want 
to compare Y between these crops, I write :
summary(lm(Y~Crop))
TukeyHSD(aov(Y~Crop))

And with X :
summary(lm(Y~Crop*X))
… etc.

Then I want to compare the effect of each individual crop. I can do as you 
suggested Jim : 3 variables M, B, P with values 0 or 1 if the crop is present, 
0 otherwise and add interactions.
summary(lm(Y~M+B+P+M*P+…))

But here come my questions. This model seems to have 2 drawbacks.
1. How can I do pairwise comparisons here as I would do with Turkey test ? How 
can test the hypothesis, for instance, “Bean provides higher cover than Maize 
whatever the mixture is” ?
2. When it comes to interactions with other variables it gets quite complicated 
(also note that in real life I have 5 crops, not 3 and other explanatory 
variables) :
lm(Y~M*X+ B*X + P*X + M*B + M*P + P*B)

So, isn’t there a way to make it more concise ?

I hope it makes sens.

Thanks,
Margot

> On Friday, May 12, 2017 at 2:53 AM, Jim Lemon  (mailto:drjimle...@gmail.com)> wrote:
> Hi Margot,
> I'm not sure I understand your model, but if I make up some data in
> which the response variable is vegetation cover and the three species
> are:
>
> A - eats one type of plant
> B - eats another type of plant
> C - preys on herbivorous insects
>
> df<-read.table(text="field,propveg,A,B,C
> 1,1,0,0,1
> 2,0.3,1,1,1
> 3,0.6,0,1,1
> 4,0.2,1,1,1
> 5,0.7,1,0,1
> 6,0.8,0,0,0
> 7,0.3,1,0,0
> 8,0.4,0,1,0
> 9,0.1,1,1,0
> 10,0.5,0,1,0
> 11,0.5,1,0,1
> 12,0.1,1,1,0
> 13,0.6,0,1,1
> 14,0,1,1,0",
> sep=",",header=TRUE)
> print(summary(lm(propveg~A+B+C+A:B+A:C+B:C,df)))
>
> Is that something like what you want?
>
> Jim
>
> On Fri, May 12, 2017 at 12:40 AM, Margot Neyret  
> wrote:
> > Hello,
> >
> > I have fields with species mixtures (for instance, species a, b, c, a+b, 
> > a+c, b+c), and I look at the effect of each species on a response Y. More 
> > specifically, I would like to compare the effect of individual species, 
> > either alone or in mixture.
> >
> > > Y = rnorm(18,0,1)
> > > mixture= rep(c('a','b', 'c', 'a+b', 'a+c', 'b+c'), each = 3)
> >
> > Thus I create variables A, B and C with :
> > - A = 1 when the mixture contains a (ie mixture = a or a+b or a+c); and 0 
> > otherwise.
> > - Idem for variables C and B.
> >
> > > A = ifelse(mixture %in% c('a', 'a+b', 'a+c'), 1, 0)
> > > B = ifelse(mixture %in% c('b', 'a+b', 'b+c'), 1, 0)
> > > C = ifelse(mixture %in% c('c', 'a+c', 'b+c'), 1, 0)
> >
> > My plan was to build a design matrix from these 3 variables, that would 
> > then allow me to compare the effects of each species.
> >
> > > mm = model.matrix(~A+B+C+0)
> > > summary(lm(Y~mm))
> > Coefficients:
> > Estimate Std. Error t value Pr(>|t|)
> > (Intercept) -0.8301 0.6221 -1.334 0.203
> > mmA 1.1636 0.4819 2.415 0.030 *
> > mmB 0.8452 0.4819 1.754 0.101
> > mmC -0.1005 0.4819 -0.208 0.838
> > ---
> > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 0.8347 on 14 degrees of freedom
> > Multiple R-squared: 0.4181, Adjusted R-squared: 0.2934
> > F-statistic: 3.353 on 3 and 14 DF, p-value: 0.04964
> >
> > My questions :
> > 1. Does this approach make any sense ? I have a feeling I am doing 
> > something strange but I cannot put my finger on it.
> > 1. My ddl are wrong, I should not have an intercept here, or at least my 
> > intercept should be one of my species. Should I just remove one species 
> > form the design matrix ?
> > 2. Is there any way to do post-hoc tests on my species now, as I would have 
> > done with Tukey test or lsmeans ?
> >
> > My objective afterwards is to add other explanatory variables and 
> > interactions in the model.
> >
> > Thanks in advance !
> >
> > M. N.
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.