Re: [R] Lag variable by group

2015-09-08 Thread Janka VANSCHOENWINKEL
Hi Petr and other member who can use this post,

Somebody gave me an answer in a private email which worked for me!

The only thing I needed to do was to make first a data.table object of my
data. Then the code works!

library(data.table)
data <- data.table(data, key = "id")
data[, lag.t1:=c(NA, t1[-.N]), by=id]

Thank you very much for your help Petr!

I really appreciate it!

Janka



2015-09-08 8:37 GMT+02:00 PIKAL Petr <petr.pi...@precheza.cz>:

> Hi
>
> Thanks for providing data. I did not see any response and frankly speaking
> I do not use data.table so I am not sure what do you mean by lagging t1.
>
> I would start with ordering data.
> ooo<-order(data$id, data$year)
> data <- data[ooo,]
>
> Then you can split data according to id.
>
> datas<-split(data[,c(1,3)], data$id)
>
> dput(head(datas))
> structure(list(`28954` = structure(list(year = c(2005, 2006,
> 2007, 2008), t1 = c(-1.81807494163513, -1.81807494163513,
> -1.81807494163513,
> -1.81807494163513)), .Names = c("year", "t1"), row.names = c(58L,
> 45L, 35L, 46L), class = "data.frame"), `28955` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513)), .Names =
> c("year",
> "t1"), row.names = c(59L, 70L, 69L, 72L), class = "data.frame"),
> `28956` = structure(list(year = c(2005, 2006, 2007, 2008),
> t1 = c(-1.81807494163513, -1.81807494163513, -1.81807494163513,
> -1.81807494163513)), .Names = c("year", "t1"), row.names = c(53L,
> 66L, 74L, 51L), class = "data.frame"), `28957` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513
> )), .Names = c("year", "t1"), row.names = c(71L, 64L,
> 54L, 24L), class = "data.frame"), `28958` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513
> )), .Names = c("year", "t1"), row.names = c(34L, 27L,
> 1L, 31L), class = "data.frame"), `28959` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513
> )), .Names = c("year", "t1"), row.names = c(17L, 18L,
> 30L, 44L), class = "data.frame")), .Names = c("28954", "28955",
> "28956", "28957", "28958", "28959"))
>
> But now I am lost what result you expect. Can you explain it on this
> smaller data set?
>
> Cheers
> Petr
>
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Janka
> > VANSCHOENWINKEL
> > Sent: Monday, September 07, 2015 1:18 PM
> > To: r-help@r-project.org
> > Subject: [R] Lag variable by group
> >
> > Hi!
> >
> > I have the following dataset with the variables ID (this is a unique ID
> > per farmer), year, and another variable t1.
> > I now would like to have a fourth variable which is the lag value of t1
> > for each farm ID.
> >
> > I found a code on the internet that does exactly what I need, but it
> > does not work for this dataset. Does anyone have suggestions about how
> > I can make this work?
> >
> > Thanks a lot!
> >
> > Janka
> >
> > data<-structure(list(year = c(2007, 2005, 2008, 2006, 2005, 2007, 2006,
> > 2008, 2007, 2005, 2007, 2007, 2005, 2006, 2005, 2006, 2005, 2006, 2007,
> > 2007, 2005, 2008, 2007, 2008, 2005, 2005, 2006, 2008, 2007, 2007, 2008,
> > 2008, 2006, 2005, 2007, 2006, 2008, 2008, 2007, 2007, 2007, 2006, 2006,
> > 2008, 2006, 2008, 2008, 2008, 2006, 2007, 2008, 2007, 2005, 2007, 2008,
> > 2005, 2007, 2005, 2005, 2008, 2005, 2006, 2005, 2006, 2008, 2006, 2008,
> > 2006, 2007, 2006, 2005, 2008, 2006, 2007, 2008, 2006, 2006, 2006, 2005,
> > 2008, 2006, 2008, 2006, 2006, 2006, 2007, 2008, 2005, 2007, 2006, 2007,
> > 2008, 2006, 2008, 2005, 2007, 2005, 2007, 2006, 2006), id = c(28958L,
> > 28962L, 28962L, 28965L, 28960L, 28962L, 28964L, 28970L, 28961L, 28965L,
> > 78458L, 28960L, 28961L, 28961L, 28969L, 28962L, 28959L, 28959L, 58845L,
> > 28965L, 28963L, 78459L, 28967L, 28957L, 28964L, 28966L, 28958L, 28960L,
> > 28969L, 28959L, 28958L, 28969L, 58845L, 28958L, 28954L, 28963L, 78458L,
> > 28965L, 28966L, 28963L, 28970L, 28970L, 28960L, 28959L, 28954L, 28954L,
> > 58845L, 28967L, 28966L, 78459L, 28956L, 28964L, 2895

Re: [R] Lag variable by group

2015-09-08 Thread Janka VANSCHOENWINKEL
Wow! Thanks for pointing that out! And thanks for testing it out as well!

It is always the first year available (unbalanced panel) that should get NA.

So using the code line you provided earlier, this should work:

library(data.table)
data <- data.table(newdata, key = "id")
ooo<-order(data$id, data$year)
data <- data[ooo,]
data$lagvar<-data[, lag.t1:=c(NA, t1[-.N]), by=id]

Thank you very much for pointing that out!



2015-09-08 9:05 GMT+02:00 PIKAL Petr <petr.pi...@precheza.cz>:

> Hm. I tried your example but what puzzles me is that your data are not
> sorted by year and therefore sometimes the first year is changed to NA but
> sometimes any arbitrary year is changed to NA.
>
>
>
> > head(data)
>
>yearidt1lag.t1
>
> 1: 2007 28954 -1.818075NA
>
> 2: 2006 28954 -1.818075 -1.818075
>
> 3: 2008 28954 -1.818075 -1.818075
>
> 4: 2005 28954 -1.818075 -1.818075
>
> 5: 2005 28955 -1.818075NA
>
> 6: 2007 28955 -1.818075 -1.818075
>
>
>
> Is it what you intended?
>
> Cheers
>
> Petr
>
>
>
>
>
> *From:* Janka VANSCHOENWINKEL [mailto:janka.vanschoenwin...@uhasselt.be]
> *Sent:* Tuesday, September 08, 2015 8:48 AM
> *To:* PIKAL Petr
> *Cc:* r-help@r-project.org
> *Subject:* Re: [R] Lag variable by group
>
>
>
> Hi Petr and other member who can use this post,
>
>
>
> Somebody gave me an answer in a private email which worked for me!
>
>
>
> The only thing I needed to do was to make first a data.table object of my
> data. Then the code works!
>
>
>
> library(data.table)
> data <- data.table(data, key = "id")
> data[, lag.t1:=c(NA, t1[-.N]), by=id]
>
>
>
> Thank you very much for your help Petr!
>
>
>
> I really appreciate it!
>
>
>
> Janka
>
>
>
>
>
>
>
> 2015-09-08 8:37 GMT+02:00 PIKAL Petr <petr.pi...@precheza.cz>:
>
> Hi
>
> Thanks for providing data. I did not see any response and frankly speaking
> I do not use data.table so I am not sure what do you mean by lagging t1.
>
> I would start with ordering data.
> ooo<-order(data$id, data$year)
> data <- data[ooo,]
>
> Then you can split data according to id.
>
> datas<-split(data[,c(1,3)], data$id)
>
> dput(head(datas))
> structure(list(`28954` = structure(list(year = c(2005, 2006,
> 2007, 2008), t1 = c(-1.81807494163513, -1.81807494163513,
> -1.81807494163513,
> -1.81807494163513)), .Names = c("year", "t1"), row.names = c(58L,
> 45L, 35L, 46L), class = "data.frame"), `28955` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513)), .Names =
> c("year",
> "t1"), row.names = c(59L, 70L, 69L, 72L), class = "data.frame"),
> `28956` = structure(list(year = c(2005, 2006, 2007, 2008),
> t1 = c(-1.81807494163513, -1.81807494163513, -1.81807494163513,
> -1.81807494163513)), .Names = c("year", "t1"), row.names = c(53L,
> 66L, 74L, 51L), class = "data.frame"), `28957` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513
> )), .Names = c("year", "t1"), row.names = c(71L, 64L,
> 54L, 24L), class = "data.frame"), `28958` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513
> )), .Names = c("year", "t1"), row.names = c(34L, 27L,
> 1L, 31L), class = "data.frame"), `28959` = structure(list(
> year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
> -1.81807494163513, -1.81807494163513, -1.81807494163513
> )), .Names = c("year", "t1"), row.names = c(17L, 18L,
> 30L, 44L), class = "data.frame")), .Names = c("28954", "28955",
> "28956", "28957", "28958", "28959"))
>
> But now I am lost what result you expect. Can you explain it on this
> smaller data set?
>
> Cheers
> Petr
>
>
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Janka
> > VANSCHOENWINKEL
> > Sent: Monday, September 07, 2015 1:18 PM
> > To: r-help@r-project.org
> > Subject: [R] Lag variable by group
> >
> > Hi!
> >
> > I have the following dataset with the variables ID (this is a unique ID
> > per farmer), year, and another variable t1.

Re: [R] Lag variable by group

2015-09-08 Thread PIKAL Petr
Hi

Thanks for providing data. I did not see any response and frankly speaking I do 
not use data.table so I am not sure what do you mean by lagging t1.

I would start with ordering data.
ooo<-order(data$id, data$year)
data <- data[ooo,]

Then you can split data according to id.

datas<-split(data[,c(1,3)], data$id)

dput(head(datas))
structure(list(`28954` = structure(list(year = c(2005, 2006,
2007, 2008), t1 = c(-1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513)), .Names = c("year", "t1"), row.names = c(58L,
45L, 35L, 46L), class = "data.frame"), `28955` = structure(list(
year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513)), .Names = 
c("year",
"t1"), row.names = c(59L, 70L, 69L, 72L), class = "data.frame"),
`28956` = structure(list(year = c(2005, 2006, 2007, 2008),
t1 = c(-1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513)), .Names = c("year", "t1"), row.names = c(53L,
66L, 74L, 51L), class = "data.frame"), `28957` = structure(list(
year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513
)), .Names = c("year", "t1"), row.names = c(71L, 64L,
54L, 24L), class = "data.frame"), `28958` = structure(list(
year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513
)), .Names = c("year", "t1"), row.names = c(34L, 27L,
1L, 31L), class = "data.frame"), `28959` = structure(list(
year = c(2005, 2006, 2007, 2008), t1 = c(-1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513
)), .Names = c("year", "t1"), row.names = c(17L, 18L,
30L, 44L), class = "data.frame")), .Names = c("28954", "28955",
"28956", "28957", "28958", "28959"))

But now I am lost what result you expect. Can you explain it on this smaller 
data set?

Cheers
Petr

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Janka
> VANSCHOENWINKEL
> Sent: Monday, September 07, 2015 1:18 PM
> To: r-help@r-project.org
> Subject: [R] Lag variable by group
>
> Hi!
>
> I have the following dataset with the variables ID (this is a unique ID
> per farmer), year, and another variable t1.
> I now would like to have a fourth variable which is the lag value of t1
> for each farm ID.
>
> I found a code on the internet that does exactly what I need, but it
> does not work for this dataset. Does anyone have suggestions about how
> I can make this work?
>
> Thanks a lot!
>
> Janka
>
> data<-structure(list(year = c(2007, 2005, 2008, 2006, 2005, 2007, 2006,
> 2008, 2007, 2005, 2007, 2007, 2005, 2006, 2005, 2006, 2005, 2006, 2007,
> 2007, 2005, 2008, 2007, 2008, 2005, 2005, 2006, 2008, 2007, 2007, 2008,
> 2008, 2006, 2005, 2007, 2006, 2008, 2008, 2007, 2007, 2007, 2006, 2006,
> 2008, 2006, 2008, 2008, 2008, 2006, 2007, 2008, 2007, 2005, 2007, 2008,
> 2005, 2007, 2005, 2005, 2008, 2005, 2006, 2005, 2006, 2008, 2006, 2008,
> 2006, 2007, 2006, 2005, 2008, 2006, 2007, 2008, 2006, 2006, 2006, 2005,
> 2008, 2006, 2008, 2006, 2006, 2006, 2007, 2008, 2005, 2007, 2006, 2007,
> 2008, 2006, 2008, 2005, 2007, 2005, 2007, 2006, 2006), id = c(28958L,
> 28962L, 28962L, 28965L, 28960L, 28962L, 28964L, 28970L, 28961L, 28965L,
> 78458L, 28960L, 28961L, 28961L, 28969L, 28962L, 28959L, 28959L, 58845L,
> 28965L, 28963L, 78459L, 28967L, 28957L, 28964L, 28966L, 28958L, 28960L,
> 28969L, 28959L, 28958L, 28969L, 58845L, 28958L, 28954L, 28963L, 78458L,
> 28965L, 28966L, 28963L, 28970L, 28970L, 28960L, 28959L, 28954L, 28954L,
> 58845L, 28967L, 28966L, 78459L, 28956L, 28964L, 28956L, 28957L, 28961L,
> 28970L, 28968L, 28954L, 28955L, 28968L, 28968L, 28967L, 28967L, 28957L,
> 28966L, 28956L, 28964L, 28969L, 28955L, 28955L, 28957L, 28955L, 28968L,
> 28956L, 28963L, 29004L, 58848L, 29005L, 28974L, 29005L, 28974L, 29006L,
> 28981L, 29007L, 29002L, 28980L, 29001L, 29006L, 29005L, 28989L, 28989L,
> 58846L, 28980L, 28981L, 78467L, 28990L, 28973L, 29004L, 28972L,
> 29006L), t1 = c(-1.81807494163513, -1.81807494163513, -
> 1.81807494163513, -1.81807494163513, -1.81807494163513, -
> 1.81807494163513, -1.81807494163513, -1.81807494163513, -
> 1.81807494163513, -1.81807494163513, -1.81807494163513, -
> 1.81807494163513, -1.81807494163513, -1.81807494163513, -
> 1.81807494163513, -1.81807494163513, -1.81807494163513, -
> 1.81807494163513, -1.81807494163513, -1.81807494163513, -
> 1.81807494163513, -1.8180749416351

[R] Lag variable by group

2015-09-07 Thread Janka VANSCHOENWINKEL
Hi!

I have the following dataset with the variables ID (this is a unique ID per
farmer), year, and another variable t1.
I now would like to have a fourth variable which is the lag value of t1 for
each farm ID.

I found a code on the internet that does exactly what I need, but it does
not work for this dataset. Does anyone have suggestions about how I can
make this work?

Thanks a lot!

Janka

data<-structure(list(year = c(2007, 2005, 2008, 2006, 2005, 2007, 2006,
2008, 2007, 2005, 2007, 2007, 2005, 2006, 2005, 2006, 2005, 2006,
2007, 2007, 2005, 2008, 2007, 2008, 2005, 2005, 2006, 2008, 2007,
2007, 2008, 2008, 2006, 2005, 2007, 2006, 2008, 2008, 2007, 2007,
2007, 2006, 2006, 2008, 2006, 2008, 2008, 2008, 2006, 2007, 2008,
2007, 2005, 2007, 2008, 2005, 2007, 2005, 2005, 2008, 2005, 2006,
2005, 2006, 2008, 2006, 2008, 2006, 2007, 2006, 2005, 2008, 2006,
2007, 2008, 2006, 2006, 2006, 2005, 2008, 2006, 2008, 2006, 2006,
2006, 2007, 2008, 2005, 2007, 2006, 2007, 2008, 2006, 2008, 2005,
2007, 2005, 2007, 2006, 2006), id = c(28958L, 28962L,
28962L, 28965L, 28960L, 28962L, 28964L, 28970L, 28961L, 28965L,
78458L, 28960L, 28961L, 28961L, 28969L, 28962L, 28959L, 28959L,
58845L, 28965L, 28963L, 78459L, 28967L, 28957L, 28964L, 28966L,
28958L, 28960L, 28969L, 28959L, 28958L, 28969L, 58845L, 28958L,
28954L, 28963L, 78458L, 28965L, 28966L, 28963L, 28970L, 28970L,
28960L, 28959L, 28954L, 28954L, 58845L, 28967L, 28966L, 78459L,
28956L, 28964L, 28956L, 28957L, 28961L, 28970L, 28968L, 28954L,
28955L, 28968L, 28968L, 28967L, 28967L, 28957L, 28966L, 28956L,
28964L, 28969L, 28955L, 28955L, 28957L, 28955L, 28968L, 28956L,
28963L, 29004L, 58848L, 29005L, 28974L, 29005L, 28974L, 29006L,
28981L, 29007L, 29002L, 28980L, 29001L, 29006L, 29005L, 28989L,
28989L, 58846L, 28980L, 28981L, 78467L, 28990L, 28973L, 29004L,
28972L, 29006L), t1 = c(-1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.81807494163513, -1.81807494163513, -1.81807494163513,
-1.81807494163513, -1.43884992599487, -1.43884992599487, -1.43884992599487,
-1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487,
-1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487,
-1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487,
-1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487,
-1.43884992599487, -1.43884992599487, -1.43884992599487, -1.43884992599487,
-1.43884992599487, -1.43884992599487)), .Names = c("year", "id",
"t1"), row.names = c(NA, 100L), class = "data.frame")

library(data.table)
data[, lag.t1:=c(NA, t1[-.N]), by=id]


Thank you very much!

Janka

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.