[R] gamlss to predict dependent variables in [0, 1] interval (fractional variable)

2016-12-27 Thread Janka VANSCHOENWINKEL
Dear R-users,

I want to model a proportional variable bounded by [0,1]  (the % of land
fertilized). A high percentage of the data contains 0s (60%), a smaller
percentage contains 1s (10%), and all the rest falls in between.


I want to compare different models with each other to see their
performance, however the model I am currently looking at is a zero-one
inflated beta model. I am using the R package gamlss for this.


However, I am having some troubles with the quite technical documentation
of the gamlss package and I don’t seem to find an answer to my questions
below:

1)  model

The model below should model 3 submodels: one part that models the
probability of having y=0 versus y>0 (nu.formula), one part that models the
probability of having y=1 versus y<1 (tau.formula) and a final part that
models all the values in between.


gam<-gamlss(proportion~x1+x2,nu.formula=~ x1+x2,tau.formula=~ x1+x2,
family= BEINF, data=Alldata)


This is okay I think.

2)  prediction

I would like to know now what is the predicted probability of an
observation to have y = 0 or y = 1. I predicted the probability of y = 0
with the code below, however I get values that go far beyond the [0-1]
interval. Therefore, they cannot be probabilities since these have to be in
the interval [0,1].


Alldata$fit_proportion_0<-predict(gam, what="nu", type='response')

summary(Alldata$fit_proportion_0)


Could somebody explain me how to obtain the correct probabilities because
the code above does not seem to work. I think the answer to my problem can
be find on section 10.8.2, page 215 of the following link (
http://www.gamlss.org/wp-content/uploads/2013/01/book-2010-Athens1.pdf). I
think it says that the predict function that I use gives another answer,
that I have to use in a certain formula to find the real probabilities. But
I am not sure how to make this work?



3)  interpretation

Also, to be sure, I would like to know how to interpret the different
coefficients of the three models and how to use the coefficients separately
to determine. For the Nu and Tau models these should be interpreted as
log-odd ratios, right? And the model in the middle is just a normal
log-model, right?

4)  validity

Finally, I do not find a lot of information on how to correctly test the
validity of this model? Do you test that for all three subparts separately?
Or is there a test to model the entire model at once?



Thank you very much for your help! I am aware of the fact that some of this
questions ar very basic.


Janka

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Weighted demean by group on only a selection of the dataset

2015-12-24 Thread Janka VANSCHOENWINKEL
Dear colleagues,

I am trying to find a simple code to demean
1) only certain values of a dataset,
2) by group
3) and in a weighted fasion.

Currently, I can only demean all the numeric variables in the dataset:

Data[,sapply(Data,  is.numeric)] <- apply(Data[sapply(Data,
is.numeric)], 2, function(x) scale(x, scale = FALSE))

Assume that my dataset looks like this:
Country<- c('BE','BE','DE','GR','IT','ES','DE','NL')
Landvalue<- c(21000, 23400, 26800, 15000,18000,23000,19000,23000)
Temperature_spring <- c('15','16','14','18','23','21','12','15')
Temperature_summer <- c('25','18','19','23','24','22','15','19')
Temperature_autumn <- c('14','12','12','10','20','20','11','13')
Temperature_winter <- c('9','4','12','14','15','13','17','12')
Weight<-c('5','20','3','2','15','21','13','8')
Data <- data.frame(Country, Landvalue,
Temperature_spring,Temperature_summer,
Temperature_autumn,Temperature_winter, Weight)


Now imagine I only want to demean the temperature-variables, grouped
by country and weighted by weight. With grouped by country I mean that
I want to subtract only the mean of Belgium from an observation in
Belgium.

Does somebody know how to add the three functions to the code line I
already have? Or if this does not work, what code should I use?

Thank you very much and have a nice Christmas!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lag variable by group

2015-09-08 Thread Janka VANSCHOENWINKEL
Hi Petr and other member who can use this post,

Somebody gave me an answer in a private email which worked for me!

The only thing I needed to do was to make first a data.table object of my
data. Then the code works!

library(data.table)
data <- data.table(data, key = "id")
data[, lag.t1:=c(NA, t1[-.N]), by=id]

Thank you very much for your help Petr!

I really appreciate it!

Janka



2015-09-08 8:37 GMT+02:00 PIKAL Petr <petr.pi...@precheza.cz>:

> Hi
>
> Thanks for providing data. I did not see any response and frankly speaking
> I do not use data.table so I am not sure what do you mean by lagging t1.
>
> I would start with ordering data.
> ooo<-order(data$id, data$year)
> data <- data[ooo,]
>
> Then you can split data according to id.
>
> datas<-split(data[,c(1,3)], data$id)
>
> dput(head(datas))
> structure(list(`28954` = structure(list(year = c(2005, 2006,
> 2007, 2008), t1 = c(-1.81807494163513, -1.81807494163513,
> -1.81807494163513,
> -1.81807494163513)), .Names = c("year", "t1"), row.names = c(58L,
> 45L, 35L, 46L), class = "data.frame"), `28955` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513)), .Names =
> c("year",
> "t1"), row.names = c(59L, 70L, 69L, 72L), class = "data.frame"),
> `28956` = structure(list(year = c(2005, 2006, 2007, 2008),
> t1 = c(-1.81807494163513, -1.81807494163513, -1.81807494163513,
> -1.81807494163513)), .Names = c("year", "t1"), row.names = c(53L,
> 66L, 74L, 51L), class = "data.frame"), `28957` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513
> )), .Names = c("year", "t1"), row.names = c(71L, 64L,
> 54L, 24L), class = "data.frame"), `28958` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513
> )), .Names = c("year", "t1"), row.names = c(34L, 27L,
> 1L, 31L), class = "data.frame"), `28959` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513
> )), .Names = c("year", "t1"), row.names = c(17L, 18L,
> 30L, 44L), class = "data.frame")), .Names = c("28954", "28955",
> "28956", "28957", "28958", "28959"))
>
> But now I am lost what result you expect. Can you explain it on this
> smaller data set?
>
> Cheers
> Petr
>
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Janka
> > VANSCHOENWINKEL
> > Sent: Monday, September 07, 2015 1:18 PM
> > To: r-help@r-project.org
> > Subject: [R] Lag variable by group
> >
> > Hi!
> >
> > I have the following dataset with the variables ID (this is a unique ID
> > per farmer), year, and another variable t1.
> > I now would like to have a fourth variable which is the lag value of t1
> > for each farm ID.
> >
> > I found a code on the internet that does exactly what I need, but it
> > does not work for this dataset. Does anyone have suggestions about how
> > I can make this work?
> >
> > Thanks a lot!
> >
> > Janka
> >
> > data<-structure(list(year = c(2007, 2005, 2008, 2006, 2005, 2007, 2006,
> > 2008, 2007, 2005, 2007, 2007, 2005, 2006, 2005, 2006, 2005, 2006, 2007,
> > 2007, 2005, 2008, 2007, 2008, 2005, 2005, 2006, 2008, 2007, 2007, 2008,
> > 2008, 2006, 2005, 2007, 2006, 2008, 2008, 2007, 2007, 2007, 2006, 2006,
> > 2008, 2006, 2008, 2008, 2008, 2006, 2007, 2008, 2007, 2005, 2007, 2008,
> > 2005, 2007, 2005, 2005, 2008, 2005, 2006, 2005, 2006, 2008, 2006, 2008,
> > 2006, 2007, 2006, 2005, 2008, 2006, 2007, 2008, 2006, 2006, 2006, 2005,
> > 2008, 2006, 2008, 2006, 2006, 2006, 2007, 2008, 2005, 2007, 2006, 2007,
> > 2008, 2006, 2008, 2005, 2007, 2005, 2007, 2006, 2006), id = c(28958L,
> > 28962L, 28962L, 28965L, 28960L, 28962L, 28964L, 28970L, 28961L, 28965L,
> > 78458L, 28960L, 28961L, 28961L, 28969L, 28962L, 28959L, 28959L, 58845L,
> > 28965L, 28963L, 78459L, 28967L, 28957L, 28964L, 28966L, 28958L, 28960L,
> > 28969L, 28959L, 28958L, 28969L, 58845L, 28958L, 28954L, 28963L, 78458L,
> > 28965L, 28966L, 28963L, 28970L, 28970L, 28960L, 28959L, 28954L, 28954L,
> > 58845L, 28967L, 28966L, 78459L, 28956L, 28964L, 2895

Re: [R] Lag variable by group

2015-09-08 Thread Janka VANSCHOENWINKEL
Wow! Thanks for pointing that out! And thanks for testing it out as well!

It is always the first year available (unbalanced panel) that should get NA.

So using the code line you provided earlier, this should work:

library(data.table)
data <- data.table(newdata, key = "id")
ooo<-order(data$id, data$year)
data <- data[ooo,]
data$lagvar<-data[, lag.t1:=c(NA, t1[-.N]), by=id]

Thank you very much for pointing that out!



2015-09-08 9:05 GMT+02:00 PIKAL Petr <petr.pi...@precheza.cz>:

> Hm. I tried your example but what puzzles me is that your data are not
> sorted by year and therefore sometimes the first year is changed to NA but
> sometimes any arbitrary year is changed to NA.
>
>
>
> > head(data)
>
>yearidt1lag.t1
>
> 1: 2007 28954 -1.818075NA
>
> 2: 2006 28954 -1.818075 -1.818075
>
> 3: 2008 28954 -1.818075 -1.818075
>
> 4: 2005 28954 -1.818075 -1.818075
>
> 5: 2005 28955 -1.818075NA
>
> 6: 2007 28955 -1.818075 -1.818075
>
>
>
> Is it what you intended?
>
> Cheers
>
> Petr
>
>
>
>
>
> *From:* Janka VANSCHOENWINKEL [mailto:janka.vanschoenwin...@uhasselt.be]
> *Sent:* Tuesday, September 08, 2015 8:48 AM
> *To:* PIKAL Petr
> *Cc:* r-help@r-project.org
> *Subject:* Re: [R] Lag variable by group
>
>
>
> Hi Petr and other member who can use this post,
>
>
>
> Somebody gave me an answer in a private email which worked for me!
>
>
>
> The only thing I needed to do was to make first a data.table object of my
> data. Then the code works!
>
>
>
> library(data.table)
> data <- data.table(data, key = "id")
> data[, lag.t1:=c(NA, t1[-.N]), by=id]
>
>
>
> Thank you very much for your help Petr!
>
>
>
> I really appreciate it!
>
>
>
> Janka
>
>
>
>
>
>
>
> 2015-09-08 8:37 GMT+02:00 PIKAL Petr <petr.pi...@precheza.cz>:
>
> Hi
>
> Thanks for providing data. I did not see any response and frankly speaking
> I do not use data.table so I am not sure what do you mean by lagging t1.
>
> I would start with ordering data.
> ooo<-order(data$id, data$year)
> data <- data[ooo,]
>
> Then you can split data according to id.
>
> datas<-split(data[,c(1,3)], data$id)
>
> dput(head(datas))
> structure(list(`28954` = structure(list(year = c(2005, 2006,
> 2007, 2008), t1 = c(-1.81807494163513, -1.81807494163513,
> -1.81807494163513,
> -1.81807494163513)), .Names = c("year", "t1"), row.names = c(58L,
> 45L, 35L, 46L), class = "data.frame"), `28955` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513)), .Names =
> c("year",
> "t1"), row.names = c(59L, 70L, 69L, 72L), class = "data.frame"),
> `28956` = structure(list(year = c(2005, 2006, 2007, 2008),
> t1 = c(-1.81807494163513, -1.81807494163513, -1.81807494163513,
> -1.81807494163513)), .Names = c("year", "t1"), row.names = c(53L,
> 66L, 74L, 51L), class = "data.frame"), `28957` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513
> )), .Names = c("year", "t1"), row.names = c(71L, 64L,
> 54L, 24L), class = "data.frame"), `28958` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513
> )), .Names = c("year", "t1"), row.names = c(34L, 27L,
> 1L, 31L), class = "data.frame"), `28959` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513
> )), .Names = c("year", "t1"), row.names = c(17L, 18L,
> 30L, 44L), class = "data.frame")), .Names = c("28954", "28955",
> "28956", "28957", "28958", "28959"))
>
> But now I am lost what result you expect. Can you explain it on this
> smaller data set?
>
> Cheers
> Petr
>
>
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Janka
> > VANSCHOENWINKEL
> > Sent: Monday, September 07, 2015 1:18 PM
> > To: r-help@r-project.org
> > Subject: [R] Lag variable by group
> >
> > Hi!
> >
> > I have the following dataset with the variables ID (this is a unique ID
> > per farmer), year, and another variable t1.

[R] Lag variable by group

2015-09-07 Thread Janka VANSCHOENWINKEL
Hi!

I have the following dataset with the variables ID (this is a unique ID per
farmer), year, and another variable t1.
I now would like to have a fourth variable which is the lag value of t1 for
each farm ID.

I found a code on the internet that does exactly what I need, but it does
not work for this dataset. Does anyone have suggestions about how I can
make this work?

Thanks a lot!

Janka

data<-structure(list(year = c(2007, 2005, 2008, 2006, 2005, 2007, 2006,
2008, 2007, 2005, 2007, 2007, 2005, 2006, 2005, 2006, 2005, 2006,
2007, 2007, 2005, 2008, 2007, 2008, 2005, 2005, 2006, 2008, 2007,
2007, 2008, 2008, 2006, 2005, 2007, 2006, 2008, 2008, 2007, 2007,
2007, 2006, 2006, 2008, 2006, 2008, 2008, 2008, 2006, 2007, 2008,
2007, 2005, 2007, 2008, 2005, 2007, 2005, 2005, 2008, 2005, 2006,
2005, 2006, 2008, 2006, 2008, 2006, 2007, 2006, 2005, 2008, 2006,
2007, 2008, 2006, 2006, 2006, 2005, 2008, 2006, 2008, 2006, 2006,
2006, 2007, 2008, 2005, 2007, 2006, 2007, 2008, 2006, 2008, 2005,
2007, 2005, 2007, 2006, 2006), id = c(28958L, 28962L,
28962L, 28965L, 28960L, 28962L, 28964L, 28970L, 28961L, 28965L,
78458L, 28960L, 28961L, 28961L, 28969L, 28962L, 28959L, 28959L,
58845L, 28965L, 28963L, 78459L, 28967L, 28957L, 28964L, 28966L,
28958L, 28960L, 28969L, 28959L, 28958L, 28969L, 58845L, 28958L,
28954L, 28963L, 78458L, 28965L, 28966L, 28963L, 28970L, 28970L,
28960L, 28959L, 28954L, 28954L, 58845L, 28967L, 28966L, 78459L,
28956L, 28964L, 28956L, 28957L, 28961L, 28970L, 28968L, 28954L,
28955L, 28968L, 28968L, 28967L, 28967L, 28957L, 28966L, 28956L,
28964L, 28969L, 28955L, 28955L, 28957L, 28955L, 28968L, 28956L,
28963L, 29004L, 58848L, 29005L, 28974L, 29005L, 28974L, 29006L,
28981L, 29007L, 29002L, 28980L, 29001L, 29006L, 29005L, 28989L,
28989L, 58846L, 28980L, 28981L, 78467L, 28990L, 28973L, 29004L,
28972L, 29006L), t1 = c(-1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.43884992599487, -1.43884992599487, -1.43884992599487,
-1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487,
-1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487,
-1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487,
-1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487,
-1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487,
-1.43884992599487, -1.43884992599487)), .Names = c("year", "id",
"t1"), row.names = c(NA, 100L), class = "data.frame")

library(data.table)
data[, lag.t1:=c(NA, t1[-.N]), by=id]


Thank you very much!

Janka

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cut variable within a loop

2015-08-23 Thread Janka VANSCHOENWINKEL
Thank you all very much. A combination of the solutions suggested solved my
problem!

2015-08-16 22:31 GMT+02:00 David Winsemius dwinsem...@comcast.net:


 On Aug 16, 2015, at 8:57 AM, Janka VANSCHOENWINKEL wrote:

  Hi David,
 
  Thanks for your comment. I'll explain what I want to do. I explained it
 already earlier but the explanation might have gone lost in some of the
 emails.

 I now see that you did explain that you wanted the positional matching in
 cut2 as a break. The code runs without error on my machine, but delivers
 a lot of warnings about masking. You are repeatedly using attach on the
 same named objects. Using `attach` in programming is generally not a good
 idea. In interactive use it is safer to use `with`, although that is not
 generally considered safe in programming, either.

 You need to do a better job of nailing down the source of the difficulty
 what ever it might be. While you say the cut2 function doesn't work, you
 don't actually give evidence of failure.

 It's fairly simple to show that your theory about why your code fails in
 some way as being due to cut2 failing to accept an i value inside an
 lapply call is just wrong:

  o - lapply(1:3, function(i) { cut2( 0:10, i) } )
  o
 [[1]]
  [1]  0  [ 1,10] [ 1,10] [ 1,10] [ 1,10] [ 1,10] [ 1,10]
  [8] [ 1,10] [ 1,10] [ 1,10] [ 1,10]
 Levels:  0 [ 1,10]

 [[2]]
  [1] [ 0, 2) [ 0, 2) [ 2,10] [ 2,10] [ 2,10] [ 2,10] [ 2,10]
  [8] [ 2,10] [ 2,10] [ 2,10] [ 2,10]
 Levels: [ 0, 2) [ 2,10]

 [[3]]
  [1] [ 0, 3) [ 0, 3) [ 0, 3) [ 3,10] [ 3,10] [ 3,10] [ 3,10]
  [8] [ 3,10] [ 3,10] [ 3,10] [ 3,10]
 Levels: [ 0, 3) [ 3,10]


 You also have two different definitions of weight2 for your irrigation
 model:


 Alldata_Irrigation$weight2-Alldata_Irrigation$sys02*Alldata_Irrigation$se025
 Alldata_Irrigation$weight2-Alldata_Irrigation$b48+Alldata_Irrigation$b50

 --
 David
 
  The variable irrigation ranges from 0 to 100. (maybe not in de small
 sample I gave, but in reality I have over 6 observations and there the
 variable ranges from 0 to 100). I want to make (and use) 100 different
 samples. The sample is based each time on the i that I put at the
 beginning of the loop.
 
  So:
 
  i = 1: this means there are 2 subsets. One from 0-1, another from 1-100
  i = 2: this means there are 2 subsets. One from 0-2, another from 2-100
  i = 3: this means there are 2 subsets. One from 0-3, another from 3-100
  i = 4: this means there are 2 subsets. One from 0-4, another from 4-100
  ...
  i = 96: this means there are 2 subsets. One from 0-96, another from
 96-100
  i = 97: this means there are 2 subsets. One from 0-97, another from
 97-100
  i = 98: this means there are 2 subsets. One from 0-98, another from
 98-100
  i = 99: this means there are 2 subsets. One from 0-99, another from
 99-100
 
  It might be possible that i = 1 and i = 2 give the same results in the
 small dataset. But in the full dataset all numbers are represented.
 
  The cut2 function is capable of cutting a sample based on a number
 supplied. Yet, when I tell him this number is i, then it doesn't work. If
 instead I write that the number is 10, then it does work and it gives me 2
 subsets from 0-10 and from 10-100.
 
  Hope this is more clear!
 
  Janka
 
 
  2015-08-14 20:10 GMT+02:00 David Winsemius dwinsem...@comcast.net:
 
  When using a function in R you may need to supply an argument name. Are
 you expecting this to be the number of groups. I cannot decipher the intent
 here with such sparse commentary, but this call to `cut2` does not make
 sense to me. Perhaps you meant the number of groups?  in which case you
 need  cut2( Alldata$irrigation, g=i ), since the arguments to cut2 are not
 that same as the arguments to cut.
 
  At the moment you are implicitly sending on the first pass a 1 and then
 on the second pass a 2 to the second argument of cut2 which is the `breaks`
 argument. So you wold be getting two different factors each with different
 cut-point levels. I looked at your data and in point of fact there would be
 no difference since you have 29 zero values and no values between 0 and 1.
 
   table(cut2(dat$irrigation, 1))
 
  0 [  1,100]
 2921
   table(cut2(dat$irrigation, 2))
 
  0 [  2,100]
 2921
 
 
 
 
levels(Alldata$irri)-c(0,1)
  
Alldata_Rainfed-subset(Alldata, irri == 0)
Alldata_Irrigation-subset(Alldata, irri == 1)
  
Alldata_Rainfed$w-Alldata_Rainfed$b48+Alldata_Rainfed$b50
Alldata_Irrigation$w-Alldata_Irrigation$b48+Alldata_Irrigation$b50
  
OLS_Rainfed - lm(LnALVperHA~ps1+ps2+ps3+ps4+ts1+ts2+ts3+ts4+
ps1sq+ps2sq+ps3sq+ps4sq+ts1sq+ts2sq+ts3sq+ts4sq+
pdnsty+portsML+cities500k+rentedland+subsidies1+
elevmean+elevrange+
t_gravel+t_gravel+t_ph_h2o+t_silt+t_sand+
AT+BE+DK+ES+FI+FR+GR+IE+IT+LU+NL+PT+SE+WDE+EDE+UK,
  weights=w

Re: [R] cut variable within a loop

2015-08-16 Thread Janka VANSCHOENWINKEL
,
  2.3490463257, 8.5, 24.878392334, 4, 1.3997615814,
  34.7799987792969, 6.6980926514), b50 = c(0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 34.2400016784668, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0), irrigation = c(0, 100, 0, 5.45454584062099,
  7.9365074634552, 89.3392562866211, 0, 17.6470592617989, 0, 0,
  65.5172407627106, 0, 61.904764175415, 34.4827562570572, 7.95454531908035,
  75, 0, 0, 0, 0, 0, 0, 5.26393800973892, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 74.6153831481934, 84.6153914928436, 0, 5.09554147720337,
  0, 0, 0, 21.0884347558022, 18.4549376368523, 6.1224490404129,
  25.3731369972229, 2.12765969336033, 0, 84.3988716602325, 0, 0,
  0, 100), awc_class = c(106.228088378906, 78.2306137084961,
 80.9311141967773,
  32.4921531677246, 54.8475151062012, 80.6665878295898, 116.331588745117,
  54.8475151062012, 54.8475151062012, 54.8475151062012, 54.8475151062012,
  54.8475151062012, 56.3101806640625, 32.4921531677246, 54.8475151062012,
  32.4921531677246, 59.3034172058105, 101.193893432617, 96.5840377807617,
  54.2786560058594, 87.1388244628906, 66.1907730102539, 57.205738067627,
  55.4114303588867, 55.4114303588867, 80.9288787841797, 63.6008758544922,
  150, 30.3404140472412, 30.3404140472412, 19.8318557739258,
 104.236854553223,
  79.2445755004883, 57.0045547485352, 54.8475151062012, 34.320426940918,
  54.8475151062012, 34.320426940918, 34.320426940918, 32.4921531677246,
  65.1337509155273, 34.320426940918, 54.8475151062012, 73.6748657226562,
  54.8475151062012, 56.3101806640625, 54.8475151062012, 32.4921531677246,
  127.726959228516, 27.9528160095215), sys02 = c(18.8571434020996,
  303.529418945312, 30.2469139099121, 104.305557250977, 86.4935073852539,
  51.25, 83.0927810668945, 453.118286132812, 42.5, 104.305557250977,
  48.461540222168, 86.4935073852539, 55.1851844787598, 104.305557250977,
  104.305557250977, 185.20996094, 17.9775276184082, 25.286254883,
  64, 21.660308838, 30, 24.2372875213623, 47.0285720825195,
  16.1904754638672, 33.75, 22.5423736572266, 10.2857141494751,
  39.230770111084, 6.06741571426392, 1, 28.3255805969238, 21.603814697,
  69.2592620849609, 86.641235352, 48.5185203552246, 44.4186058044434,
  48.6538467407227, 437.105255126953, 437.105255126953, 19.160308838,
  48.461540222168, 437.105255126953, 48.6538467407227, 453.118286132812,
  48.6538467407227, 14.2857141494751, 453.118286132812, 453.118286132812,
  95.2380981445312, 63), se025 = c(163.529998779297, 2.7004768372,
  157, 5.5, 6.3019073486, 36.024577637, 86, 5.0990463257,
  6.4009536743, 6, 8.6980926514, 4, 6.3019073486,
 5.8019073486,
  8.8019073486, 2, 118.809997558594, 44.116103516,
 16.707629395,
  34, 73.415258789, 73.0800018310547, 134.880004882812,
 31.024577637,
  20.036866455, 94.7200012207031, 40, 5.5, 16.5, 15, 26.878392334,
  59.4199981689453, 13, 5.1980926514, 6.8019073486,
 15.698092651,
  10.896185303, 5.3019073486, 4.5990463257, 29.396185303,
  23.292370605, 4.9009536743, 13.396185303, 2.3490463257,
  8.5, 24.878392334, 4.1980926514, 1.3997615814,
 34.7799987792969,
  6.6980926514)), .Names = c(LnALVperHA, ps1, ps2, ps3,
  ps4, ts1, ts2, ts3, ts4, ps1sq, ps2sq, ps3sq,
  ps4sq, ts1sq, ts2sq, ts3sq, ts4sq, pdnsty, portsML,
  cities500k, rentedland, subsidies1, elevmean, elevrange,
  t_gravel, t_ph_h2o, t_silt, t_sand, AT, BE, DE,
  DK, ES, FI, FR, GR, IE, IT, LU, NL, PT, SE,
  WDE, EDE, UK, CY, BG, CZ, EE, HU, LT, LV,
  PL, RO, SI, SK, b48, b50, irrigation, awc_class,
  sys02, se025), row.names = c(2, 3, 4, 5, 6, 7,
  8, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
  21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
  32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
  43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53
  ), class = data.frame)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  2015-08-14 14:58 GMT+02:00 PIKAL Petr petr.pi...@precheza.cz:
 
  Hi Janka
 
 
 
  Sorry, but we are limited in connecting to web services so I am not
 able to restore your data and see your code. Result of dput(somedata)
 coppied to email is preferable for sharing data and code can be copied to
 email too. But do not use HTML as it usually scrambles  text.
 
 
 
  Answer in line
 
 
 
  From: Janka Vanschoenwinkel [mailto:janka.vanschoenwin...@uhasselt.be]
  Sent: Friday, August 14, 2015 2:17 PM
  To: Thierry Onkelinx; PIKAL Petr
  Cc: r-help@r-project.org
  Subject: Re: [R] cut variable within a loop
 
 
 
  Hi Thierry and Petr,
 
 
 
  I really appreciate the comments you already gave. Thank you very much
 for that.
 
 
 
  Below you can find a link to the data and the code. Hopefully this
 helps in spotting the error.
 
 
 
  I still think the issue is that the cut2 function only accepts numbers,
 and not an i that refers to the number at the start of the loop. To
 answer Petr his question, yes, column 3 and 4 are NA

Re: [R] cut variable within a loop

2015-08-14 Thread Janka Vanschoenwinkel
, 6.3019073486, 5.8019073486,
8.8019073486, 2, 118.809997558594, 44.116103516, 16.707629395,
34, 73.415258789, 73.0800018310547, 134.880004882812, 31.024577637,
20.036866455, 94.7200012207031, 40, 5.5, 16.5, 15, 26.878392334,
59.4199981689453, 13, 5.1980926514, 6.8019073486, 15.698092651,
10.896185303, 5.3019073486, 4.5990463257, 29.396185303,
23.292370605, 4.9009536743, 13.396185303, 2.3490463257,
8.5, 24.878392334, 4.1980926514, 1.3997615814, 34.7799987792969,
6.6980926514)), .Names = c(LnALVperHA, ps1, ps2, ps3,
ps4, ts1, ts2, ts3, ts4, ps1sq, ps2sq, ps3sq,
ps4sq, ts1sq, ts2sq, ts3sq, ts4sq, pdnsty, portsML,
cities500k, rentedland, subsidies1, elevmean, elevrange,
t_gravel, t_ph_h2o, t_silt, t_sand, AT, BE, DE,
DK, ES, FI, FR, GR, IE, IT, LU, NL, PT, SE,
WDE, EDE, UK, CY, BG, CZ, EE, HU, LT, LV,
PL, RO, SI, SK, b48, b50, irrigation, awc_class,
sys02, se025), row.names = c(2, 3, 4, 5, 6, 7,
8, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53
), class = data.frame)



























2015-08-14 14:58 GMT+02:00 PIKAL Petr petr.pi...@precheza.cz:

 Hi Janka



 Sorry, but we are limited in connecting to web services so I am not able to 
 restore your data and see your code. Result of dput(somedata) coppied to 
 email is preferable for sharing data and code can be copied to email too. But 
 do not use HTML as it usually scrambles  text.



 Answer in line



 From: Janka Vanschoenwinkel [mailto:janka.vanschoenwin...@uhasselt.be]
 Sent: Friday, August 14, 2015 2:17 PM
 To: Thierry Onkelinx; PIKAL Petr
 Cc: r-help@r-project.org
 Subject: Re: [R] cut variable within a loop



 Hi Thierry and Petr,



 I really appreciate the comments you already gave. Thank you very much for 
 that.



 Below you can find a link to the data and the code. Hopefully this helps in 
 spotting the error.



 I still think the issue is that the cut2 function only accepts numbers, and 
 not an i that refers to the number at the start of the loop. To answer Petr 
 his question, yes, column 3 and 4 are NA (these are the columns of the second 
 interval). But I don't really understand your point so could you clarify this 
 please?



 If you use NA as a number of intervals you will get such errors



 k-c(2,4,NA,5)

 ii-vector(4, mode=list)

 for (i in 1:4) {

 ii[[i]] - cut2(iris[,i], k[i])

 }

 Error in if (r[1]  cuts[1]) cuts - c(r[1], cuts) :

   missing value where TRUE/FALSE needed

 for (i in 1:4) {

 ii[[i]] - cut(iris[,i], k[i])

 }

 Error in cut.default(iris[, i], k[i]) : invalid number of intervals



 If you remove NA from k definition error is gone.

 k-c(2,4,3,5)

 ii-vector(4, mode=list)



 for (i in 1:4) {

 ii[[i]] - cut(iris[,i], k[i])

 }



 You can try it yourself. The error is not related to cycle; whenever number 
 of intervals in cut call is NA you always get an error.



 Cheers

 Petr



 https://drive.google.com/folderview?id=0By9u5m3kxn9yfkxxeVNMdnRQQXhoT05CRlJlZVBCWWF2NURMMTNmVFVFeXJXXzhlMWE4SUkusp=sharing



 Thank you very much once again!



 Janka







 2015-08-11 15:10 GMT+02:00 Thierry Onkelinx thierry.onkel...@inbo.be:

 You'll need to send a reproducible example of the code. We can't run the code 
 that you send. Hence it is hard to help you. See e.g. 
 http://adv-r.had.co.nz/Reproducibility.html


 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
 Forest
 team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
 Kliniekstraat 25
 1070 Anderlecht
 Belgium

 To call in the statistician after the experiment is done may be no more than 
 asking him to perform a post-mortem examination: he may be able to say what 
 the experiment died of. ~ Sir Ronald Aylmer Fisher
 The plural of anecdote is not data. ~ Roger Brinner
 The combination of some data and an aching desire for an answer does not 
 ensure that a reasonable answer can be extracted from a given body of data. ~ 
 John Tukey



 2015-08-11 14:57 GMT+02:00 Janka Vanschoenwinkel 
 janka.vanschoenwin...@uhasselt.be:

 Hi Thierry!



 Thanks for your answer. I tried this, but I get this error:



 Error in cut.default(x, k2) : invalid number of intervals



 Which is strange because I am not specifying intervals, but the number at 
 where the sample has to be cut?



 Greetings from Belgium! :-)



 2015-08-11 14:52 GMT+02:00 Thierry Onkelinx thierry.onkel...@inbo.be:

 Dear Janka,



 You loop goes for 0 to 100. It should probably go from 1:99



 Best regards,


 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
 Forest
 team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
 Kliniekstraat 25
 1070 Anderlecht
 Belgium

 To call in the statistician after the experiment is done may be no more than 
 asking him

Re: [R] cut variable within a loop

2015-08-14 Thread Janka Vanschoenwinkel
Hey Michael,

Sorry for the late reply!

Thanks for your comment, but for the cut2 command, this is not the case. If
I enter for instance

Alldata$irri=cut2(irrigation,3)

Then I get 2 intervals from 0-3 and from 3-100.

Janka

2015-08-11 17:25 GMT+02:00 Michael Dewey li...@dewey.myzen.co.uk:

 Dear Janka
 If you supply a single number to the breaks parameter of cut I think it is
 the number of intervals.


 On 11/08/2015 13:57, Janka Vanschoenwinkel wrote:

 Hi Thierry!

 Thanks for your answer. I tried this, but I get this error:

 Error in cut.default(x, k2) : invalid number of intervals

 Which is strange because I am not specifying intervals, but the number at
 where the sample has to be cut?

 Greetings from Belgium! :-)

 2015-08-11 14:52 GMT+02:00 Thierry Onkelinx thierry.onkel...@inbo.be:

 Dear Janka,

 You loop goes for 0 to 100. It should probably go from 1:99

 Best regards,

 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature
 and
 Forest
 team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
 Kliniekstraat 25
 1070 Anderlecht
 Belgium

 To call in the statistician after the experiment is done may be no more
 than asking him to perform a post-mortem examination: he may be able to
 say
 what the experiment died of. ~ Sir Ronald Aylmer Fisher
 The plural of anecdote is not data. ~ Roger Brinner
 The combination of some data and an aching desire for an answer does not
 ensure that a reasonable answer can be extracted from a given body of
 data.
 ~ John Tukey

 2015-08-11 14:38 GMT+02:00 Janka Vanschoenwinkel 
 janka.vanschoenwin...@uhasselt.be:

 Dear list members,

 I have a loop where I want to do several calculations for different
 samples
 and save the results for each sample. These samples are for each loop
 different. I want to use the i in the loop to cut the samples.

 So for instance:

 - In loop 1 (i=1), I have a sample from 0-1 and a sample from 1-100.
 - In loop 2 (i=2), I have a sample from 0-2 and a sample from 2-100.
 - In loop 99 (i=99), I have a sample from 0-99 and a sample from
 99-100.

 I built the following function, but there is *a problem with the cut2
 function* since it doesn't recognize the i. Outside the lapply loop it
 works, but not inside the loop.

 Could somebody please help me with this problem? Thanks a lot!




 d=data.frame(MEt_Rainfed=rep(0,100),MEp_Rainfed=rep(0,100),MEt_Irrigation=rep(0,100),MEp_Irrigation=rep(0,100))



  o-lapply(0:100, function(i){



  Alldata$irri=cut2(Alldata$irrigation,i)

  levels(Alldata$irri)-c(0,1)



 Alldata_Rainfed-subset(Alldata, irri == 0)

 Alldata_Irrigation-subset(Alldata, irri == 1)



  #calculations per sample, then store all the values per i and per
 variable in a dataframe: (the calculations are not shown in this
 example)



   d[i, ] = c(MEt_Rainfed,MEp_Rainfed,MEt_Irrigation,MEp_Irrigation)



 })



 out-as.data.frame(do.call(rbind, o))


 --
 P Please consider the environment before printing this e-mail

  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






 --
 Michael
 http://www.dewey.myzen.co.uk/home.html




-- 

[image: Logo UHasselt]Mevrouw Janka Vanschoenwinkel
*Doctoraatsbursaal - PhD *
Milieueconomie - Environmental economics

T +32(0)11 26 87 42 | GSM +32(0)476 28 21 40

www.uhasselt.be/eec

Universiteit Hasselt | Campus Diepenbeek
Agoralaan Gebouw D | B-3590 Diepenbeek
Kantoor F11

Postadres: Universiteit Hasselt | Martelarenlaan 42 | B-3500 Hasselt

P Please consider the environment before printing this e-mail

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cut variable within a loop

2015-08-14 Thread Janka Vanschoenwinkel
Hi Thierry and Petr,

I really appreciate the comments you already gave. Thank you very much for
that.

Below you can find a link to the data and the code. Hopefully this helps in
spotting the error.

I still think the issue is that the cut2 function only accepts numbers, and
not an i that refers to the number at the start of the loop. To answer
Petr his question, yes, column 3 and 4 are NA (these are the columns of the
second interval). But I don't really understand your point so could you
clarify this please?

https://drive.google.com/folderview?id=0By9u5m3kxn9yfkxxeVNMdnRQQXhoT05CRlJlZVBCWWF2NURMMTNmVFVFeXJXXzhlMWE4SUkusp=sharing

Thank you very much once again!

Janka



2015-08-11 15:10 GMT+02:00 Thierry Onkelinx thierry.onkel...@inbo.be:

 You'll need to send a reproducible example of the code. We can't run the
 code that you send. Hence it is hard to help you. See e.g.
 http://adv-r.had.co.nz/Reproducibility.html

 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
 Forest
 team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
 Kliniekstraat 25
 1070 Anderlecht
 Belgium

 To call in the statistician after the experiment is done may be no more
 than asking him to perform a post-mortem examination: he may be able to say
 what the experiment died of. ~ Sir Ronald Aylmer Fisher
 The plural of anecdote is not data. ~ Roger Brinner
 The combination of some data and an aching desire for an answer does not
 ensure that a reasonable answer can be extracted from a given body of data.
 ~ John Tukey

 2015-08-11 14:57 GMT+02:00 Janka Vanschoenwinkel 
 janka.vanschoenwin...@uhasselt.be:

 Hi Thierry!

 Thanks for your answer. I tried this, but I get this error:

 Error in cut.default(x, k2) : invalid number of intervals

 Which is strange because I am not specifying intervals, but the number at
 where the sample has to be cut?

 Greetings from Belgium! :-)

 2015-08-11 14:52 GMT+02:00 Thierry Onkelinx thierry.onkel...@inbo.be:

 Dear Janka,

 You loop goes for 0 to 100. It should probably go from 1:99

 Best regards,

 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature
 and Forest
 team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
 Kliniekstraat 25
 1070 Anderlecht
 Belgium

 To call in the statistician after the experiment is done may be no more
 than asking him to perform a post-mortem examination: he may be able to say
 what the experiment died of. ~ Sir Ronald Aylmer Fisher
 The plural of anecdote is not data. ~ Roger Brinner
 The combination of some data and an aching desire for an answer does not
 ensure that a reasonable answer can be extracted from a given body of data.
 ~ John Tukey

 2015-08-11 14:38 GMT+02:00 Janka Vanschoenwinkel 
 janka.vanschoenwin...@uhasselt.be:

 Dear list members,

 I have a loop where I want to do several calculations for different
 samples
 and save the results for each sample. These samples are for each loop
 different. I want to use the i in the loop to cut the samples.

 So for instance:

- In loop 1 (i=1), I have a sample from 0-1 and a sample from 1-100.
- In loop 2 (i=2), I have a sample from 0-2 and a sample from 2-100.
- In loop 99 (i=99), I have a sample from 0-99 and a sample from
 99-100.

 I built the following function, but there is *a problem with the cut2
 function* since it doesn't recognize the i. Outside the lapply loop it
 works, but not inside the loop.

 Could somebody please help me with this problem? Thanks a lot!



 d=data.frame(MEt_Rainfed=rep(0,100),MEp_Rainfed=rep(0,100),MEt_Irrigation=rep(0,100),MEp_Irrigation=rep(0,100))



 o-lapply(0:100, function(i){



 Alldata$irri=cut2(Alldata$irrigation,i)

 levels(Alldata$irri)-c(0,1)



Alldata_Rainfed-subset(Alldata, irri == 0)

Alldata_Irrigation-subset(Alldata, irri == 1)



 #calculations per sample, then store all the values per i and per
 variable in a dataframe: (the calculations are not shown in this
 example)



  d[i, ] = c(MEt_Rainfed,MEp_Rainfed,MEt_Irrigation,MEp_Irrigation)



})



out-as.data.frame(do.call(rbind, o))


 --
 P Please consider the environment before printing this e-mail

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





 --

 [image: Logo UHasselt]Mevrouw Janka Vanschoenwinkel
 *Doctoraatsbursaal - PhD *
 Milieueconomie - Environmental economics

 T +32(0)11 26 87 42 | GSM +32(0)476 28 21 40

 www.uhasselt.be/eec

 Universiteit Hasselt | Campus Diepenbeek
 Agoralaan Gebouw D | B-3590 Diepenbeek
 Kantoor F11

 Postadres: Universiteit Hasselt | Martelarenlaan 42 | B-3500 Hasselt

 P Please

[R] cut variable within a loop

2015-08-11 Thread Janka Vanschoenwinkel
Dear list members,

I have a loop where I want to do several calculations for different samples
and save the results for each sample. These samples are for each loop
different. I want to use the i in the loop to cut the samples.

So for instance:

   - In loop 1 (i=1), I have a sample from 0-1 and a sample from 1-100.
   - In loop 2 (i=2), I have a sample from 0-2 and a sample from 2-100.
   - In loop 99 (i=99), I have a sample from 0-99 and a sample from 99-100.

I built the following function, but there is *a problem with the cut2
function* since it doesn't recognize the i. Outside the lapply loop it
works, but not inside the loop.

Could somebody please help me with this problem? Thanks a lot!


d=data.frame(MEt_Rainfed=rep(0,100),MEp_Rainfed=rep(0,100),MEt_Irrigation=rep(0,100),MEp_Irrigation=rep(0,100))



o-lapply(0:100, function(i){



Alldata$irri=cut2(Alldata$irrigation,i)

levels(Alldata$irri)-c(0,1)



   Alldata_Rainfed-subset(Alldata, irri == 0)

   Alldata_Irrigation-subset(Alldata, irri == 1)



#calculations per sample, then store all the values per i and per
variable in a dataframe: (the calculations are not shown in this example)



 d[i, ] = c(MEt_Rainfed,MEp_Rainfed,MEt_Irrigation,MEp_Irrigation)



   })



   out-as.data.frame(do.call(rbind, o))


-- 
P Please consider the environment before printing this e-mail

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cut variable within a loop

2015-08-11 Thread Janka Vanschoenwinkel
Hi Thierry!

Thanks for your answer. I tried this, but I get this error:

Error in cut.default(x, k2) : invalid number of intervals

Which is strange because I am not specifying intervals, but the number at
where the sample has to be cut?

Greetings from Belgium! :-)

2015-08-11 14:52 GMT+02:00 Thierry Onkelinx thierry.onkel...@inbo.be:

 Dear Janka,

 You loop goes for 0 to 100. It should probably go from 1:99

 Best regards,

 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
 Forest
 team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
 Kliniekstraat 25
 1070 Anderlecht
 Belgium

 To call in the statistician after the experiment is done may be no more
 than asking him to perform a post-mortem examination: he may be able to say
 what the experiment died of. ~ Sir Ronald Aylmer Fisher
 The plural of anecdote is not data. ~ Roger Brinner
 The combination of some data and an aching desire for an answer does not
 ensure that a reasonable answer can be extracted from a given body of data.
 ~ John Tukey

 2015-08-11 14:38 GMT+02:00 Janka Vanschoenwinkel 
 janka.vanschoenwin...@uhasselt.be:

 Dear list members,

 I have a loop where I want to do several calculations for different
 samples
 and save the results for each sample. These samples are for each loop
 different. I want to use the i in the loop to cut the samples.

 So for instance:

- In loop 1 (i=1), I have a sample from 0-1 and a sample from 1-100.
- In loop 2 (i=2), I have a sample from 0-2 and a sample from 2-100.
- In loop 99 (i=99), I have a sample from 0-99 and a sample from
 99-100.

 I built the following function, but there is *a problem with the cut2
 function* since it doesn't recognize the i. Outside the lapply loop it
 works, but not inside the loop.

 Could somebody please help me with this problem? Thanks a lot!



 d=data.frame(MEt_Rainfed=rep(0,100),MEp_Rainfed=rep(0,100),MEt_Irrigation=rep(0,100),MEp_Irrigation=rep(0,100))



 o-lapply(0:100, function(i){



 Alldata$irri=cut2(Alldata$irrigation,i)

 levels(Alldata$irri)-c(0,1)



Alldata_Rainfed-subset(Alldata, irri == 0)

Alldata_Irrigation-subset(Alldata, irri == 1)



 #calculations per sample, then store all the values per i and per
 variable in a dataframe: (the calculations are not shown in this example)



  d[i, ] = c(MEt_Rainfed,MEp_Rainfed,MEt_Irrigation,MEp_Irrigation)



})



out-as.data.frame(do.call(rbind, o))


 --
 P Please consider the environment before printing this e-mail

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 

[image: Logo UHasselt]Mevrouw Janka Vanschoenwinkel
*Doctoraatsbursaal - PhD *
Milieueconomie - Environmental economics

T +32(0)11 26 87 42 | GSM +32(0)476 28 21 40

www.uhasselt.be/eec

Universiteit Hasselt | Campus Diepenbeek
Agoralaan Gebouw D | B-3590 Diepenbeek
Kantoor F11

Postadres: Universiteit Hasselt | Martelarenlaan 42 | B-3500 Hasselt

P Please consider the environment before printing this e-mail

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] simultaneous equation model with endogenous interaction terms

2015-08-10 Thread Janka Vanschoenwinkel
Dear list members,

I am building a model such as:

Y1 = Y2*X1 + X2
Y2 = Y1*X1 + X2

X2 is the exogenous variable
Z1 is the instrument of Y1
Z2 is the instrument of Y2

This is a simultaneous equation model. I know how to build a simultaneous
equation model without interaction terms:

library(systemfit)
eq1 - Y1~Y2+X2+Z2
eq2 - Y2~Y1+X2+Z1
inst - ~X2+Z1+Z2
system - list(eq1=eq1, eq2=eq2)
reg2SLS -systemfit(system, 2SLS, inst=inst, data=mydata)
summary(reg2SLS)



I also know how to do a normal 2SLS with interaction terms:
library(systemfit)
ivreg(Y1~Y2*X1 | Z2*X1, data= Alldata)



However, I don't know how to deal with the interaction terms in the
simultaneous equation model.

I am experimenting both with R and STATA to see which formulation gives the
same result in both softwares, but until know without success.

Could somebody help me with this?

Thank you very much!

Janka

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Testing for significant differences between groups in multiple linear regression

2015-01-23 Thread Janka Vanschoenwinkel
Dear R-colleagues,

I am looking for a way to test whether one regression has significant
different coefficients and overall results for 10 groups (grouping variable
is irr).



*What I have*

The regression is:

Depend = temp + temp² + perc + perc² + conti è split up for multiple groups
of irr


  *Dataset = Alldata (real dataset has over 5 IDs)*

*ID*

*irr *

*(= grouping variable)*

*temp*

*perc*

*conti*

*Depend*

*w*

1

1

10

34

26

8

23

2

1

11

36

27

6

58

3

1

26

57

45

3

76

4

2

23

68

24

2

4

5

2

6

26

8

1

323

6

2

3

17

56

6

45

7

3

17

39

17

5

57



I can obtain the different regression coefficients for the different groups
with the following code (other codes are possible as wel).


datairrigation - split(Alldata, Alldata$irr)

model.per.irrigation - lapply(datairrigation, function (x) {

  lm(Depend~ temp + temp² + perc + perc² + conti,

 weights=w, data = x)

})


OR I can do it manually by splitting all the data in subsets (and then I
also receive the R²…)



*What I don’t have*

However, now I don’t know how to compare those regressions to test whether
they differ significantly over all the groups.

(Preferably, I would like to test the coefficients individually (temp(group
1) = temp(group2)) and the regression as a whole between the groups.)



*Note*

I know that one way to test differences in significance between groups, is
to use dummy variables of that group, in the regression. Yet, this is no
option for my model because it only allows exogenous variables in the
regression (and irrigation is an endogenous variable because the farmer can
decide himself if he irrigates or not).



Thank you very much in advance! I really appreciate your help!


Janka


P Please consider the environment before printing this e-mail

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.