[R] gamlss to predict dependent variables in [0, 1] interval (fractional variable)
Dear R-users, I want to model a proportional variable bounded by [0,1] (the % of land fertilized). A high percentage of the data contains 0s (60%), a smaller percentage contains 1s (10%), and all the rest falls in between. I want to compare different models with each other to see their performance, however the model I am currently looking at is a zero-one inflated beta model. I am using the R package gamlss for this. However, I am having some troubles with the quite technical documentation of the gamlss package and I don’t seem to find an answer to my questions below: 1) model The model below should model 3 submodels: one part that models the probability of having y=0 versus y>0 (nu.formula), one part that models the probability of having y=1 versus y<1 (tau.formula) and a final part that models all the values in between. gam<-gamlss(proportion~x1+x2,nu.formula=~ x1+x2,tau.formula=~ x1+x2, family= BEINF, data=Alldata) This is okay I think. 2) prediction I would like to know now what is the predicted probability of an observation to have y = 0 or y = 1. I predicted the probability of y = 0 with the code below, however I get values that go far beyond the [0-1] interval. Therefore, they cannot be probabilities since these have to be in the interval [0,1]. Alldata$fit_proportion_0<-predict(gam, what="nu", type='response') summary(Alldata$fit_proportion_0) Could somebody explain me how to obtain the correct probabilities because the code above does not seem to work. I think the answer to my problem can be find on section 10.8.2, page 215 of the following link ( http://www.gamlss.org/wp-content/uploads/2013/01/book-2010-Athens1.pdf). I think it says that the predict function that I use gives another answer, that I have to use in a certain formula to find the real probabilities. But I am not sure how to make this work? 3) interpretation Also, to be sure, I would like to know how to interpret the different coefficients of the three models and how to use the coefficients separately to determine. For the Nu and Tau models these should be interpreted as log-odd ratios, right? And the model in the middle is just a normal log-model, right? 4) validity Finally, I do not find a lot of information on how to correctly test the validity of this model? Do you test that for all three subparts separately? Or is there a test to model the entire model at once? Thank you very much for your help! I am aware of the fact that some of this questions ar very basic. Janka [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Weighted demean by group on only a selection of the dataset
Dear colleagues, I am trying to find a simple code to demean 1) only certain values of a dataset, 2) by group 3) and in a weighted fasion. Currently, I can only demean all the numeric variables in the dataset: Data[,sapply(Data, is.numeric)] <- apply(Data[sapply(Data, is.numeric)], 2, function(x) scale(x, scale = FALSE)) Assume that my dataset looks like this: Country<- c('BE','BE','DE','GR','IT','ES','DE','NL') Landvalue<- c(21000, 23400, 26800, 15000,18000,23000,19000,23000) Temperature_spring <- c('15','16','14','18','23','21','12','15') Temperature_summer <- c('25','18','19','23','24','22','15','19') Temperature_autumn <- c('14','12','12','10','20','20','11','13') Temperature_winter <- c('9','4','12','14','15','13','17','12') Weight<-c('5','20','3','2','15','21','13','8') Data <- data.frame(Country, Landvalue, Temperature_spring,Temperature_summer, Temperature_autumn,Temperature_winter, Weight) Now imagine I only want to demean the temperature-variables, grouped by country and weighted by weight. With grouped by country I mean that I want to subtract only the mean of Belgium from an observation in Belgium. Does somebody know how to add the three functions to the code line I already have? Or if this does not work, what code should I use? Thank you very much and have a nice Christmas! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lag variable by group
Hi Petr and other member who can use this post, Somebody gave me an answer in a private email which worked for me! The only thing I needed to do was to make first a data.table object of my data. Then the code works! library(data.table) data <- data.table(data, key = "id") data[, lag.t1:=c(NA, t1[-.N]), by=id] Thank you very much for your help Petr! I really appreciate it! Janka 2015-09-08 8:37 GMT+02:00 PIKAL Petr <petr.pi...@precheza.cz>: > Hi > > Thanks for providing data. I did not see any response and frankly speaking > I do not use data.table so I am not sure what do you mean by lagging t1. > > I would start with ordering data. > ooo<-order(data$id, data$year) > data <- data[ooo,] > > Then you can split data according to id. > > datas<-split(data[,c(1,3)], data$id) > > dput(head(datas)) > structure(list(`28954` = structure(list(year = c(2005, 2006, > 2007, 2008), t1 = c(-1.81807494163513, -1.81807494163513, > -1.81807494163513, > -1.81807494163513)), .Names = c("year", "t1"), row.names = c(58L, > 45L, 35L, 46L), class = "data.frame"), `28955` = structure(list( > year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513, > -1.81807494163513, -1.81807494163513, -1.81807494163513)), .Names = > c("year", > "t1"), row.names = c(59L, 70L, 69L, 72L), class = "data.frame"), > `28956` = structure(list(year = c(2005, 2006, 2007, 2008), > t1 = c(-1.81807494163513, -1.81807494163513, -1.81807494163513, > -1.81807494163513)), .Names = c("year", "t1"), row.names = c(53L, > 66L, 74L, 51L), class = "data.frame"), `28957` = structure(list( > year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513, > -1.81807494163513, -1.81807494163513, -1.81807494163513 > )), .Names = c("year", "t1"), row.names = c(71L, 64L, > 54L, 24L), class = "data.frame"), `28958` = structure(list( > year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513, > -1.81807494163513, -1.81807494163513, -1.81807494163513 > )), .Names = c("year", "t1"), row.names = c(34L, 27L, > 1L, 31L), class = "data.frame"), `28959` = structure(list( > year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513, > -1.81807494163513, -1.81807494163513, -1.81807494163513 > )), .Names = c("year", "t1"), row.names = c(17L, 18L, > 30L, 44L), class = "data.frame")), .Names = c("28954", "28955", > "28956", "28957", "28958", "28959")) > > But now I am lost what result you expect. Can you explain it on this > smaller data set? > > Cheers > Petr > > > -Original Message- > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Janka > > VANSCHOENWINKEL > > Sent: Monday, September 07, 2015 1:18 PM > > To: r-help@r-project.org > > Subject: [R] Lag variable by group > > > > Hi! > > > > I have the following dataset with the variables ID (this is a unique ID > > per farmer), year, and another variable t1. > > I now would like to have a fourth variable which is the lag value of t1 > > for each farm ID. > > > > I found a code on the internet that does exactly what I need, but it > > does not work for this dataset. Does anyone have suggestions about how > > I can make this work? > > > > Thanks a lot! > > > > Janka > > > > data<-structure(list(year = c(2007, 2005, 2008, 2006, 2005, 2007, 2006, > > 2008, 2007, 2005, 2007, 2007, 2005, 2006, 2005, 2006, 2005, 2006, 2007, > > 2007, 2005, 2008, 2007, 2008, 2005, 2005, 2006, 2008, 2007, 2007, 2008, > > 2008, 2006, 2005, 2007, 2006, 2008, 2008, 2007, 2007, 2007, 2006, 2006, > > 2008, 2006, 2008, 2008, 2008, 2006, 2007, 2008, 2007, 2005, 2007, 2008, > > 2005, 2007, 2005, 2005, 2008, 2005, 2006, 2005, 2006, 2008, 2006, 2008, > > 2006, 2007, 2006, 2005, 2008, 2006, 2007, 2008, 2006, 2006, 2006, 2005, > > 2008, 2006, 2008, 2006, 2006, 2006, 2007, 2008, 2005, 2007, 2006, 2007, > > 2008, 2006, 2008, 2005, 2007, 2005, 2007, 2006, 2006), id = c(28958L, > > 28962L, 28962L, 28965L, 28960L, 28962L, 28964L, 28970L, 28961L, 28965L, > > 78458L, 28960L, 28961L, 28961L, 28969L, 28962L, 28959L, 28959L, 58845L, > > 28965L, 28963L, 78459L, 28967L, 28957L, 28964L, 28966L, 28958L, 28960L, > > 28969L, 28959L, 28958L, 28969L, 58845L, 28958L, 28954L, 28963L, 78458L, > > 28965L, 28966L, 28963L, 28970L, 28970L, 28960L, 28959L, 28954L, 28954L, > > 58845L, 28967L, 28966L, 78459L, 28956L, 28964L, 2895
Re: [R] Lag variable by group
Wow! Thanks for pointing that out! And thanks for testing it out as well! It is always the first year available (unbalanced panel) that should get NA. So using the code line you provided earlier, this should work: library(data.table) data <- data.table(newdata, key = "id") ooo<-order(data$id, data$year) data <- data[ooo,] data$lagvar<-data[, lag.t1:=c(NA, t1[-.N]), by=id] Thank you very much for pointing that out! 2015-09-08 9:05 GMT+02:00 PIKAL Petr <petr.pi...@precheza.cz>: > Hm. I tried your example but what puzzles me is that your data are not > sorted by year and therefore sometimes the first year is changed to NA but > sometimes any arbitrary year is changed to NA. > > > > > head(data) > >yearidt1lag.t1 > > 1: 2007 28954 -1.818075NA > > 2: 2006 28954 -1.818075 -1.818075 > > 3: 2008 28954 -1.818075 -1.818075 > > 4: 2005 28954 -1.818075 -1.818075 > > 5: 2005 28955 -1.818075NA > > 6: 2007 28955 -1.818075 -1.818075 > > > > Is it what you intended? > > Cheers > > Petr > > > > > > *From:* Janka VANSCHOENWINKEL [mailto:janka.vanschoenwin...@uhasselt.be] > *Sent:* Tuesday, September 08, 2015 8:48 AM > *To:* PIKAL Petr > *Cc:* r-help@r-project.org > *Subject:* Re: [R] Lag variable by group > > > > Hi Petr and other member who can use this post, > > > > Somebody gave me an answer in a private email which worked for me! > > > > The only thing I needed to do was to make first a data.table object of my > data. Then the code works! > > > > library(data.table) > data <- data.table(data, key = "id") > data[, lag.t1:=c(NA, t1[-.N]), by=id] > > > > Thank you very much for your help Petr! > > > > I really appreciate it! > > > > Janka > > > > > > > > 2015-09-08 8:37 GMT+02:00 PIKAL Petr <petr.pi...@precheza.cz>: > > Hi > > Thanks for providing data. I did not see any response and frankly speaking > I do not use data.table so I am not sure what do you mean by lagging t1. > > I would start with ordering data. > ooo<-order(data$id, data$year) > data <- data[ooo,] > > Then you can split data according to id. > > datas<-split(data[,c(1,3)], data$id) > > dput(head(datas)) > structure(list(`28954` = structure(list(year = c(2005, 2006, > 2007, 2008), t1 = c(-1.81807494163513, -1.81807494163513, > -1.81807494163513, > -1.81807494163513)), .Names = c("year", "t1"), row.names = c(58L, > 45L, 35L, 46L), class = "data.frame"), `28955` = structure(list( > year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513, > -1.81807494163513, -1.81807494163513, -1.81807494163513)), .Names = > c("year", > "t1"), row.names = c(59L, 70L, 69L, 72L), class = "data.frame"), > `28956` = structure(list(year = c(2005, 2006, 2007, 2008), > t1 = c(-1.81807494163513, -1.81807494163513, -1.81807494163513, > -1.81807494163513)), .Names = c("year", "t1"), row.names = c(53L, > 66L, 74L, 51L), class = "data.frame"), `28957` = structure(list( > year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513, > -1.81807494163513, -1.81807494163513, -1.81807494163513 > )), .Names = c("year", "t1"), row.names = c(71L, 64L, > 54L, 24L), class = "data.frame"), `28958` = structure(list( > year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513, > -1.81807494163513, -1.81807494163513, -1.81807494163513 > )), .Names = c("year", "t1"), row.names = c(34L, 27L, > 1L, 31L), class = "data.frame"), `28959` = structure(list( > year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513, > -1.81807494163513, -1.81807494163513, -1.81807494163513 > )), .Names = c("year", "t1"), row.names = c(17L, 18L, > 30L, 44L), class = "data.frame")), .Names = c("28954", "28955", > "28956", "28957", "28958", "28959")) > > But now I am lost what result you expect. Can you explain it on this > smaller data set? > > Cheers > Petr > > > > -Original Message- > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Janka > > VANSCHOENWINKEL > > Sent: Monday, September 07, 2015 1:18 PM > > To: r-help@r-project.org > > Subject: [R] Lag variable by group > > > > Hi! > > > > I have the following dataset with the variables ID (this is a unique ID > > per farmer), year, and another variable t1.
[R] Lag variable by group
Hi! I have the following dataset with the variables ID (this is a unique ID per farmer), year, and another variable t1. I now would like to have a fourth variable which is the lag value of t1 for each farm ID. I found a code on the internet that does exactly what I need, but it does not work for this dataset. Does anyone have suggestions about how I can make this work? Thanks a lot! Janka data<-structure(list(year = c(2007, 2005, 2008, 2006, 2005, 2007, 2006, 2008, 2007, 2005, 2007, 2007, 2005, 2006, 2005, 2006, 2005, 2006, 2007, 2007, 2005, 2008, 2007, 2008, 2005, 2005, 2006, 2008, 2007, 2007, 2008, 2008, 2006, 2005, 2007, 2006, 2008, 2008, 2007, 2007, 2007, 2006, 2006, 2008, 2006, 2008, 2008, 2008, 2006, 2007, 2008, 2007, 2005, 2007, 2008, 2005, 2007, 2005, 2005, 2008, 2005, 2006, 2005, 2006, 2008, 2006, 2008, 2006, 2007, 2006, 2005, 2008, 2006, 2007, 2008, 2006, 2006, 2006, 2005, 2008, 2006, 2008, 2006, 2006, 2006, 2007, 2008, 2005, 2007, 2006, 2007, 2008, 2006, 2008, 2005, 2007, 2005, 2007, 2006, 2006), id = c(28958L, 28962L, 28962L, 28965L, 28960L, 28962L, 28964L, 28970L, 28961L, 28965L, 78458L, 28960L, 28961L, 28961L, 28969L, 28962L, 28959L, 28959L, 58845L, 28965L, 28963L, 78459L, 28967L, 28957L, 28964L, 28966L, 28958L, 28960L, 28969L, 28959L, 28958L, 28969L, 58845L, 28958L, 28954L, 28963L, 78458L, 28965L, 28966L, 28963L, 28970L, 28970L, 28960L, 28959L, 28954L, 28954L, 58845L, 28967L, 28966L, 78459L, 28956L, 28964L, 28956L, 28957L, 28961L, 28970L, 28968L, 28954L, 28955L, 28968L, 28968L, 28967L, 28967L, 28957L, 28966L, 28956L, 28964L, 28969L, 28955L, 28955L, 28957L, 28955L, 28968L, 28956L, 28963L, 29004L, 58848L, 29005L, 28974L, 29005L, 28974L, 29006L, 28981L, 29007L, 29002L, 28980L, 29001L, 29006L, 29005L, 28989L, 28989L, 58846L, 28980L, 28981L, 78467L, 28990L, 28973L, 29004L, 28972L, 29006L), t1 = c(-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487)), .Names = c("year", "id", "t1"), row.names = c(NA, 100L), class = "data.frame") library(data.table) data[, lag.t1:=c(NA, t1[-.N]), by=id] Thank you very much! Janka [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cut variable within a loop
Thank you all very much. A combination of the solutions suggested solved my problem! 2015-08-16 22:31 GMT+02:00 David Winsemius dwinsem...@comcast.net: On Aug 16, 2015, at 8:57 AM, Janka VANSCHOENWINKEL wrote: Hi David, Thanks for your comment. I'll explain what I want to do. I explained it already earlier but the explanation might have gone lost in some of the emails. I now see that you did explain that you wanted the positional matching in cut2 as a break. The code runs without error on my machine, but delivers a lot of warnings about masking. You are repeatedly using attach on the same named objects. Using `attach` in programming is generally not a good idea. In interactive use it is safer to use `with`, although that is not generally considered safe in programming, either. You need to do a better job of nailing down the source of the difficulty what ever it might be. While you say the cut2 function doesn't work, you don't actually give evidence of failure. It's fairly simple to show that your theory about why your code fails in some way as being due to cut2 failing to accept an i value inside an lapply call is just wrong: o - lapply(1:3, function(i) { cut2( 0:10, i) } ) o [[1]] [1] 0 [ 1,10] [ 1,10] [ 1,10] [ 1,10] [ 1,10] [ 1,10] [8] [ 1,10] [ 1,10] [ 1,10] [ 1,10] Levels: 0 [ 1,10] [[2]] [1] [ 0, 2) [ 0, 2) [ 2,10] [ 2,10] [ 2,10] [ 2,10] [ 2,10] [8] [ 2,10] [ 2,10] [ 2,10] [ 2,10] Levels: [ 0, 2) [ 2,10] [[3]] [1] [ 0, 3) [ 0, 3) [ 0, 3) [ 3,10] [ 3,10] [ 3,10] [ 3,10] [8] [ 3,10] [ 3,10] [ 3,10] [ 3,10] Levels: [ 0, 3) [ 3,10] You also have two different definitions of weight2 for your irrigation model: Alldata_Irrigation$weight2-Alldata_Irrigation$sys02*Alldata_Irrigation$se025 Alldata_Irrigation$weight2-Alldata_Irrigation$b48+Alldata_Irrigation$b50 -- David The variable irrigation ranges from 0 to 100. (maybe not in de small sample I gave, but in reality I have over 6 observations and there the variable ranges from 0 to 100). I want to make (and use) 100 different samples. The sample is based each time on the i that I put at the beginning of the loop. So: i = 1: this means there are 2 subsets. One from 0-1, another from 1-100 i = 2: this means there are 2 subsets. One from 0-2, another from 2-100 i = 3: this means there are 2 subsets. One from 0-3, another from 3-100 i = 4: this means there are 2 subsets. One from 0-4, another from 4-100 ... i = 96: this means there are 2 subsets. One from 0-96, another from 96-100 i = 97: this means there are 2 subsets. One from 0-97, another from 97-100 i = 98: this means there are 2 subsets. One from 0-98, another from 98-100 i = 99: this means there are 2 subsets. One from 0-99, another from 99-100 It might be possible that i = 1 and i = 2 give the same results in the small dataset. But in the full dataset all numbers are represented. The cut2 function is capable of cutting a sample based on a number supplied. Yet, when I tell him this number is i, then it doesn't work. If instead I write that the number is 10, then it does work and it gives me 2 subsets from 0-10 and from 10-100. Hope this is more clear! Janka 2015-08-14 20:10 GMT+02:00 David Winsemius dwinsem...@comcast.net: When using a function in R you may need to supply an argument name. Are you expecting this to be the number of groups. I cannot decipher the intent here with such sparse commentary, but this call to `cut2` does not make sense to me. Perhaps you meant the number of groups? in which case you need cut2( Alldata$irrigation, g=i ), since the arguments to cut2 are not that same as the arguments to cut. At the moment you are implicitly sending on the first pass a 1 and then on the second pass a 2 to the second argument of cut2 which is the `breaks` argument. So you wold be getting two different factors each with different cut-point levels. I looked at your data and in point of fact there would be no difference since you have 29 zero values and no values between 0 and 1. table(cut2(dat$irrigation, 1)) 0 [ 1,100] 2921 table(cut2(dat$irrigation, 2)) 0 [ 2,100] 2921 levels(Alldata$irri)-c(0,1) Alldata_Rainfed-subset(Alldata, irri == 0) Alldata_Irrigation-subset(Alldata, irri == 1) Alldata_Rainfed$w-Alldata_Rainfed$b48+Alldata_Rainfed$b50 Alldata_Irrigation$w-Alldata_Irrigation$b48+Alldata_Irrigation$b50 OLS_Rainfed - lm(LnALVperHA~ps1+ps2+ps3+ps4+ts1+ts2+ts3+ts4+ ps1sq+ps2sq+ps3sq+ps4sq+ts1sq+ts2sq+ts3sq+ts4sq+ pdnsty+portsML+cities500k+rentedland+subsidies1+ elevmean+elevrange+ t_gravel+t_gravel+t_ph_h2o+t_silt+t_sand+ AT+BE+DK+ES+FI+FR+GR+IE+IT+LU+NL+PT+SE+WDE+EDE+UK, weights=w
Re: [R] cut variable within a loop
, 2.3490463257, 8.5, 24.878392334, 4, 1.3997615814, 34.7799987792969, 6.6980926514), b50 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 34.2400016784668, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), irrigation = c(0, 100, 0, 5.45454584062099, 7.9365074634552, 89.3392562866211, 0, 17.6470592617989, 0, 0, 65.5172407627106, 0, 61.904764175415, 34.4827562570572, 7.95454531908035, 75, 0, 0, 0, 0, 0, 0, 5.26393800973892, 0, 0, 0, 0, 0, 0, 0, 0, 0, 74.6153831481934, 84.6153914928436, 0, 5.09554147720337, 0, 0, 0, 21.0884347558022, 18.4549376368523, 6.1224490404129, 25.3731369972229, 2.12765969336033, 0, 84.3988716602325, 0, 0, 0, 100), awc_class = c(106.228088378906, 78.2306137084961, 80.9311141967773, 32.4921531677246, 54.8475151062012, 80.6665878295898, 116.331588745117, 54.8475151062012, 54.8475151062012, 54.8475151062012, 54.8475151062012, 54.8475151062012, 56.3101806640625, 32.4921531677246, 54.8475151062012, 32.4921531677246, 59.3034172058105, 101.193893432617, 96.5840377807617, 54.2786560058594, 87.1388244628906, 66.1907730102539, 57.205738067627, 55.4114303588867, 55.4114303588867, 80.9288787841797, 63.6008758544922, 150, 30.3404140472412, 30.3404140472412, 19.8318557739258, 104.236854553223, 79.2445755004883, 57.0045547485352, 54.8475151062012, 34.320426940918, 54.8475151062012, 34.320426940918, 34.320426940918, 32.4921531677246, 65.1337509155273, 34.320426940918, 54.8475151062012, 73.6748657226562, 54.8475151062012, 56.3101806640625, 54.8475151062012, 32.4921531677246, 127.726959228516, 27.9528160095215), sys02 = c(18.8571434020996, 303.529418945312, 30.2469139099121, 104.305557250977, 86.4935073852539, 51.25, 83.0927810668945, 453.118286132812, 42.5, 104.305557250977, 48.461540222168, 86.4935073852539, 55.1851844787598, 104.305557250977, 104.305557250977, 185.20996094, 17.9775276184082, 25.286254883, 64, 21.660308838, 30, 24.2372875213623, 47.0285720825195, 16.1904754638672, 33.75, 22.5423736572266, 10.2857141494751, 39.230770111084, 6.06741571426392, 1, 28.3255805969238, 21.603814697, 69.2592620849609, 86.641235352, 48.5185203552246, 44.4186058044434, 48.6538467407227, 437.105255126953, 437.105255126953, 19.160308838, 48.461540222168, 437.105255126953, 48.6538467407227, 453.118286132812, 48.6538467407227, 14.2857141494751, 453.118286132812, 453.118286132812, 95.2380981445312, 63), se025 = c(163.529998779297, 2.7004768372, 157, 5.5, 6.3019073486, 36.024577637, 86, 5.0990463257, 6.4009536743, 6, 8.6980926514, 4, 6.3019073486, 5.8019073486, 8.8019073486, 2, 118.809997558594, 44.116103516, 16.707629395, 34, 73.415258789, 73.0800018310547, 134.880004882812, 31.024577637, 20.036866455, 94.7200012207031, 40, 5.5, 16.5, 15, 26.878392334, 59.4199981689453, 13, 5.1980926514, 6.8019073486, 15.698092651, 10.896185303, 5.3019073486, 4.5990463257, 29.396185303, 23.292370605, 4.9009536743, 13.396185303, 2.3490463257, 8.5, 24.878392334, 4.1980926514, 1.3997615814, 34.7799987792969, 6.6980926514)), .Names = c(LnALVperHA, ps1, ps2, ps3, ps4, ts1, ts2, ts3, ts4, ps1sq, ps2sq, ps3sq, ps4sq, ts1sq, ts2sq, ts3sq, ts4sq, pdnsty, portsML, cities500k, rentedland, subsidies1, elevmean, elevrange, t_gravel, t_ph_h2o, t_silt, t_sand, AT, BE, DE, DK, ES, FI, FR, GR, IE, IT, LU, NL, PT, SE, WDE, EDE, UK, CY, BG, CZ, EE, HU, LT, LV, PL, RO, SI, SK, b48, b50, irrigation, awc_class, sys02, se025), row.names = c(2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53 ), class = data.frame) 2015-08-14 14:58 GMT+02:00 PIKAL Petr petr.pi...@precheza.cz: Hi Janka Sorry, but we are limited in connecting to web services so I am not able to restore your data and see your code. Result of dput(somedata) coppied to email is preferable for sharing data and code can be copied to email too. But do not use HTML as it usually scrambles text. Answer in line From: Janka Vanschoenwinkel [mailto:janka.vanschoenwin...@uhasselt.be] Sent: Friday, August 14, 2015 2:17 PM To: Thierry Onkelinx; PIKAL Petr Cc: r-help@r-project.org Subject: Re: [R] cut variable within a loop Hi Thierry and Petr, I really appreciate the comments you already gave. Thank you very much for that. Below you can find a link to the data and the code. Hopefully this helps in spotting the error. I still think the issue is that the cut2 function only accepts numbers, and not an i that refers to the number at the start of the loop. To answer Petr his question, yes, column 3 and 4 are NA
Re: [R] cut variable within a loop
, 6.3019073486, 5.8019073486, 8.8019073486, 2, 118.809997558594, 44.116103516, 16.707629395, 34, 73.415258789, 73.0800018310547, 134.880004882812, 31.024577637, 20.036866455, 94.7200012207031, 40, 5.5, 16.5, 15, 26.878392334, 59.4199981689453, 13, 5.1980926514, 6.8019073486, 15.698092651, 10.896185303, 5.3019073486, 4.5990463257, 29.396185303, 23.292370605, 4.9009536743, 13.396185303, 2.3490463257, 8.5, 24.878392334, 4.1980926514, 1.3997615814, 34.7799987792969, 6.6980926514)), .Names = c(LnALVperHA, ps1, ps2, ps3, ps4, ts1, ts2, ts3, ts4, ps1sq, ps2sq, ps3sq, ps4sq, ts1sq, ts2sq, ts3sq, ts4sq, pdnsty, portsML, cities500k, rentedland, subsidies1, elevmean, elevrange, t_gravel, t_ph_h2o, t_silt, t_sand, AT, BE, DE, DK, ES, FI, FR, GR, IE, IT, LU, NL, PT, SE, WDE, EDE, UK, CY, BG, CZ, EE, HU, LT, LV, PL, RO, SI, SK, b48, b50, irrigation, awc_class, sys02, se025), row.names = c(2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53 ), class = data.frame) 2015-08-14 14:58 GMT+02:00 PIKAL Petr petr.pi...@precheza.cz: Hi Janka Sorry, but we are limited in connecting to web services so I am not able to restore your data and see your code. Result of dput(somedata) coppied to email is preferable for sharing data and code can be copied to email too. But do not use HTML as it usually scrambles text. Answer in line From: Janka Vanschoenwinkel [mailto:janka.vanschoenwin...@uhasselt.be] Sent: Friday, August 14, 2015 2:17 PM To: Thierry Onkelinx; PIKAL Petr Cc: r-help@r-project.org Subject: Re: [R] cut variable within a loop Hi Thierry and Petr, I really appreciate the comments you already gave. Thank you very much for that. Below you can find a link to the data and the code. Hopefully this helps in spotting the error. I still think the issue is that the cut2 function only accepts numbers, and not an i that refers to the number at the start of the loop. To answer Petr his question, yes, column 3 and 4 are NA (these are the columns of the second interval). But I don't really understand your point so could you clarify this please? If you use NA as a number of intervals you will get such errors k-c(2,4,NA,5) ii-vector(4, mode=list) for (i in 1:4) { ii[[i]] - cut2(iris[,i], k[i]) } Error in if (r[1] cuts[1]) cuts - c(r[1], cuts) : missing value where TRUE/FALSE needed for (i in 1:4) { ii[[i]] - cut(iris[,i], k[i]) } Error in cut.default(iris[, i], k[i]) : invalid number of intervals If you remove NA from k definition error is gone. k-c(2,4,3,5) ii-vector(4, mode=list) for (i in 1:4) { ii[[i]] - cut(iris[,i], k[i]) } You can try it yourself. The error is not related to cycle; whenever number of intervals in cut call is NA you always get an error. Cheers Petr https://drive.google.com/folderview?id=0By9u5m3kxn9yfkxxeVNMdnRQQXhoT05CRlJlZVBCWWF2NURMMTNmVFVFeXJXXzhlMWE4SUkusp=sharing Thank you very much once again! Janka 2015-08-11 15:10 GMT+02:00 Thierry Onkelinx thierry.onkel...@inbo.be: You'll need to send a reproducible example of the code. We can't run the code that you send. Hence it is hard to help you. See e.g. http://adv-r.had.co.nz/Reproducibility.html ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-08-11 14:57 GMT+02:00 Janka Vanschoenwinkel janka.vanschoenwin...@uhasselt.be: Hi Thierry! Thanks for your answer. I tried this, but I get this error: Error in cut.default(x, k2) : invalid number of intervals Which is strange because I am not specifying intervals, but the number at where the sample has to be cut? Greetings from Belgium! :-) 2015-08-11 14:52 GMT+02:00 Thierry Onkelinx thierry.onkel...@inbo.be: Dear Janka, You loop goes for 0 to 100. It should probably go from 1:99 Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him
Re: [R] cut variable within a loop
Hey Michael, Sorry for the late reply! Thanks for your comment, but for the cut2 command, this is not the case. If I enter for instance Alldata$irri=cut2(irrigation,3) Then I get 2 intervals from 0-3 and from 3-100. Janka 2015-08-11 17:25 GMT+02:00 Michael Dewey li...@dewey.myzen.co.uk: Dear Janka If you supply a single number to the breaks parameter of cut I think it is the number of intervals. On 11/08/2015 13:57, Janka Vanschoenwinkel wrote: Hi Thierry! Thanks for your answer. I tried this, but I get this error: Error in cut.default(x, k2) : invalid number of intervals Which is strange because I am not specifying intervals, but the number at where the sample has to be cut? Greetings from Belgium! :-) 2015-08-11 14:52 GMT+02:00 Thierry Onkelinx thierry.onkel...@inbo.be: Dear Janka, You loop goes for 0 to 100. It should probably go from 1:99 Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-08-11 14:38 GMT+02:00 Janka Vanschoenwinkel janka.vanschoenwin...@uhasselt.be: Dear list members, I have a loop where I want to do several calculations for different samples and save the results for each sample. These samples are for each loop different. I want to use the i in the loop to cut the samples. So for instance: - In loop 1 (i=1), I have a sample from 0-1 and a sample from 1-100. - In loop 2 (i=2), I have a sample from 0-2 and a sample from 2-100. - In loop 99 (i=99), I have a sample from 0-99 and a sample from 99-100. I built the following function, but there is *a problem with the cut2 function* since it doesn't recognize the i. Outside the lapply loop it works, but not inside the loop. Could somebody please help me with this problem? Thanks a lot! d=data.frame(MEt_Rainfed=rep(0,100),MEp_Rainfed=rep(0,100),MEt_Irrigation=rep(0,100),MEp_Irrigation=rep(0,100)) o-lapply(0:100, function(i){ Alldata$irri=cut2(Alldata$irrigation,i) levels(Alldata$irri)-c(0,1) Alldata_Rainfed-subset(Alldata, irri == 0) Alldata_Irrigation-subset(Alldata, irri == 1) #calculations per sample, then store all the values per i and per variable in a dataframe: (the calculations are not shown in this example) d[i, ] = c(MEt_Rainfed,MEp_Rainfed,MEt_Irrigation,MEp_Irrigation) }) out-as.data.frame(do.call(rbind, o)) -- P Please consider the environment before printing this e-mail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael http://www.dewey.myzen.co.uk/home.html -- [image: Logo UHasselt]Mevrouw Janka Vanschoenwinkel *Doctoraatsbursaal - PhD * Milieueconomie - Environmental economics T +32(0)11 26 87 42 | GSM +32(0)476 28 21 40 www.uhasselt.be/eec Universiteit Hasselt | Campus Diepenbeek Agoralaan Gebouw D | B-3590 Diepenbeek Kantoor F11 Postadres: Universiteit Hasselt | Martelarenlaan 42 | B-3500 Hasselt P Please consider the environment before printing this e-mail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cut variable within a loop
Hi Thierry and Petr, I really appreciate the comments you already gave. Thank you very much for that. Below you can find a link to the data and the code. Hopefully this helps in spotting the error. I still think the issue is that the cut2 function only accepts numbers, and not an i that refers to the number at the start of the loop. To answer Petr his question, yes, column 3 and 4 are NA (these are the columns of the second interval). But I don't really understand your point so could you clarify this please? https://drive.google.com/folderview?id=0By9u5m3kxn9yfkxxeVNMdnRQQXhoT05CRlJlZVBCWWF2NURMMTNmVFVFeXJXXzhlMWE4SUkusp=sharing Thank you very much once again! Janka 2015-08-11 15:10 GMT+02:00 Thierry Onkelinx thierry.onkel...@inbo.be: You'll need to send a reproducible example of the code. We can't run the code that you send. Hence it is hard to help you. See e.g. http://adv-r.had.co.nz/Reproducibility.html ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-08-11 14:57 GMT+02:00 Janka Vanschoenwinkel janka.vanschoenwin...@uhasselt.be: Hi Thierry! Thanks for your answer. I tried this, but I get this error: Error in cut.default(x, k2) : invalid number of intervals Which is strange because I am not specifying intervals, but the number at where the sample has to be cut? Greetings from Belgium! :-) 2015-08-11 14:52 GMT+02:00 Thierry Onkelinx thierry.onkel...@inbo.be: Dear Janka, You loop goes for 0 to 100. It should probably go from 1:99 Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-08-11 14:38 GMT+02:00 Janka Vanschoenwinkel janka.vanschoenwin...@uhasselt.be: Dear list members, I have a loop where I want to do several calculations for different samples and save the results for each sample. These samples are for each loop different. I want to use the i in the loop to cut the samples. So for instance: - In loop 1 (i=1), I have a sample from 0-1 and a sample from 1-100. - In loop 2 (i=2), I have a sample from 0-2 and a sample from 2-100. - In loop 99 (i=99), I have a sample from 0-99 and a sample from 99-100. I built the following function, but there is *a problem with the cut2 function* since it doesn't recognize the i. Outside the lapply loop it works, but not inside the loop. Could somebody please help me with this problem? Thanks a lot! d=data.frame(MEt_Rainfed=rep(0,100),MEp_Rainfed=rep(0,100),MEt_Irrigation=rep(0,100),MEp_Irrigation=rep(0,100)) o-lapply(0:100, function(i){ Alldata$irri=cut2(Alldata$irrigation,i) levels(Alldata$irri)-c(0,1) Alldata_Rainfed-subset(Alldata, irri == 0) Alldata_Irrigation-subset(Alldata, irri == 1) #calculations per sample, then store all the values per i and per variable in a dataframe: (the calculations are not shown in this example) d[i, ] = c(MEt_Rainfed,MEp_Rainfed,MEt_Irrigation,MEp_Irrigation) }) out-as.data.frame(do.call(rbind, o)) -- P Please consider the environment before printing this e-mail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- [image: Logo UHasselt]Mevrouw Janka Vanschoenwinkel *Doctoraatsbursaal - PhD * Milieueconomie - Environmental economics T +32(0)11 26 87 42 | GSM +32(0)476 28 21 40 www.uhasselt.be/eec Universiteit Hasselt | Campus Diepenbeek Agoralaan Gebouw D | B-3590 Diepenbeek Kantoor F11 Postadres: Universiteit Hasselt | Martelarenlaan 42 | B-3500 Hasselt P Please
[R] cut variable within a loop
Dear list members, I have a loop where I want to do several calculations for different samples and save the results for each sample. These samples are for each loop different. I want to use the i in the loop to cut the samples. So for instance: - In loop 1 (i=1), I have a sample from 0-1 and a sample from 1-100. - In loop 2 (i=2), I have a sample from 0-2 and a sample from 2-100. - In loop 99 (i=99), I have a sample from 0-99 and a sample from 99-100. I built the following function, but there is *a problem with the cut2 function* since it doesn't recognize the i. Outside the lapply loop it works, but not inside the loop. Could somebody please help me with this problem? Thanks a lot! d=data.frame(MEt_Rainfed=rep(0,100),MEp_Rainfed=rep(0,100),MEt_Irrigation=rep(0,100),MEp_Irrigation=rep(0,100)) o-lapply(0:100, function(i){ Alldata$irri=cut2(Alldata$irrigation,i) levels(Alldata$irri)-c(0,1) Alldata_Rainfed-subset(Alldata, irri == 0) Alldata_Irrigation-subset(Alldata, irri == 1) #calculations per sample, then store all the values per i and per variable in a dataframe: (the calculations are not shown in this example) d[i, ] = c(MEt_Rainfed,MEp_Rainfed,MEt_Irrigation,MEp_Irrigation) }) out-as.data.frame(do.call(rbind, o)) -- P Please consider the environment before printing this e-mail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cut variable within a loop
Hi Thierry! Thanks for your answer. I tried this, but I get this error: Error in cut.default(x, k2) : invalid number of intervals Which is strange because I am not specifying intervals, but the number at where the sample has to be cut? Greetings from Belgium! :-) 2015-08-11 14:52 GMT+02:00 Thierry Onkelinx thierry.onkel...@inbo.be: Dear Janka, You loop goes for 0 to 100. It should probably go from 1:99 Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-08-11 14:38 GMT+02:00 Janka Vanschoenwinkel janka.vanschoenwin...@uhasselt.be: Dear list members, I have a loop where I want to do several calculations for different samples and save the results for each sample. These samples are for each loop different. I want to use the i in the loop to cut the samples. So for instance: - In loop 1 (i=1), I have a sample from 0-1 and a sample from 1-100. - In loop 2 (i=2), I have a sample from 0-2 and a sample from 2-100. - In loop 99 (i=99), I have a sample from 0-99 and a sample from 99-100. I built the following function, but there is *a problem with the cut2 function* since it doesn't recognize the i. Outside the lapply loop it works, but not inside the loop. Could somebody please help me with this problem? Thanks a lot! d=data.frame(MEt_Rainfed=rep(0,100),MEp_Rainfed=rep(0,100),MEt_Irrigation=rep(0,100),MEp_Irrigation=rep(0,100)) o-lapply(0:100, function(i){ Alldata$irri=cut2(Alldata$irrigation,i) levels(Alldata$irri)-c(0,1) Alldata_Rainfed-subset(Alldata, irri == 0) Alldata_Irrigation-subset(Alldata, irri == 1) #calculations per sample, then store all the values per i and per variable in a dataframe: (the calculations are not shown in this example) d[i, ] = c(MEt_Rainfed,MEp_Rainfed,MEt_Irrigation,MEp_Irrigation) }) out-as.data.frame(do.call(rbind, o)) -- P Please consider the environment before printing this e-mail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- [image: Logo UHasselt]Mevrouw Janka Vanschoenwinkel *Doctoraatsbursaal - PhD * Milieueconomie - Environmental economics T +32(0)11 26 87 42 | GSM +32(0)476 28 21 40 www.uhasselt.be/eec Universiteit Hasselt | Campus Diepenbeek Agoralaan Gebouw D | B-3590 Diepenbeek Kantoor F11 Postadres: Universiteit Hasselt | Martelarenlaan 42 | B-3500 Hasselt P Please consider the environment before printing this e-mail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simultaneous equation model with endogenous interaction terms
Dear list members, I am building a model such as: Y1 = Y2*X1 + X2 Y2 = Y1*X1 + X2 X2 is the exogenous variable Z1 is the instrument of Y1 Z2 is the instrument of Y2 This is a simultaneous equation model. I know how to build a simultaneous equation model without interaction terms: library(systemfit) eq1 - Y1~Y2+X2+Z2 eq2 - Y2~Y1+X2+Z1 inst - ~X2+Z1+Z2 system - list(eq1=eq1, eq2=eq2) reg2SLS -systemfit(system, 2SLS, inst=inst, data=mydata) summary(reg2SLS) I also know how to do a normal 2SLS with interaction terms: library(systemfit) ivreg(Y1~Y2*X1 | Z2*X1, data= Alldata) However, I don't know how to deal with the interaction terms in the simultaneous equation model. I am experimenting both with R and STATA to see which formulation gives the same result in both softwares, but until know without success. Could somebody help me with this? Thank you very much! Janka [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Testing for significant differences between groups in multiple linear regression
Dear R-colleagues, I am looking for a way to test whether one regression has significant different coefficients and overall results for 10 groups (grouping variable is irr). *What I have* The regression is: Depend = temp + temp² + perc + perc² + conti è split up for multiple groups of irr *Dataset = Alldata (real dataset has over 5 IDs)* *ID* *irr * *(= grouping variable)* *temp* *perc* *conti* *Depend* *w* 1 1 10 34 26 8 23 2 1 11 36 27 6 58 3 1 26 57 45 3 76 4 2 23 68 24 2 4 5 2 6 26 8 1 323 6 2 3 17 56 6 45 7 3 17 39 17 5 57 I can obtain the different regression coefficients for the different groups with the following code (other codes are possible as wel). datairrigation - split(Alldata, Alldata$irr) model.per.irrigation - lapply(datairrigation, function (x) { lm(Depend~ temp + temp² + perc + perc² + conti, weights=w, data = x) }) OR I can do it manually by splitting all the data in subsets (and then I also receive the R²…) *What I don’t have* However, now I don’t know how to compare those regressions to test whether they differ significantly over all the groups. (Preferably, I would like to test the coefficients individually (temp(group 1) = temp(group2)) and the regression as a whole between the groups.) *Note* I know that one way to test differences in significance between groups, is to use dummy variables of that group, in the regression. Yet, this is no option for my model because it only allows exogenous variables in the regression (and irrigation is an endogenous variable because the farmer can decide himself if he irrigates or not). Thank you very much in advance! I really appreciate your help! Janka P Please consider the environment before printing this e-mail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.