[R] Identifying last record in individual growth data over different time intervalls

2007-03-05 Thread Rainer M. Krug
Hi

I have a plist t which contains size measurements of individual plants, 
identified by the field plate. It contains, among other, a field 
year indicating the year in which the individual was measured and the 
height. The number of measurements range from 1 to 4 measurements in 
different years.
My problem is that I would need the LAST measurement. I only came up 
with the solution below which is probably way to complicated, but I 
can't think of another solution.

Does anybody has an idea how to do this more effectively?

Finally I would like to have a data.frame t2 which only contains the 
entries of the last measurements.

Thanks in advance,

Rainer



  unlist(
sapply(
split(t, t$plate),
function(i)
{
i[i$year==max(i$year),]$id
}
)
)

  15  20  33  43  44  47  64 D72S200 
S201
2006001 2006003 2006005 2006007 2006008 2006009 2006014 2006015 2006016 
2006017
S202S203S204S205S206S207S208S209S210 
S211
2004095 2006019 2006020 2006021 2006022 2006023 2006024 2006025 2006026 
2006027
S212S213S214S215S216S217S218S219S220 
S222
2006028 2006029 2006030 2006031 2006032 2006033 2006034 2006035 2006036 
2006037
S223S224S225S226S227S228S229S230S231 
S232
2006038 2006039 2006040 2006041 2006042 2006043 2006044 2006045 2006046 
2006047
 
  t
  id plate year height
2004007 200400715 2004   0.40
2005024 200502415 2005   0.43
2006001 200600115 2006   0.44
2004012 200401220 2004   0.90
2005026 200502620 2005   0.94
2006003 200600320 2006   0.98
2004025 200402533 2004   0.15
2005027 200502733 2005   0.15
2006005 200600533 2006   0.16
2004035 200403543 2004   0.26
2005038 200503843 2005   0.30
2006007 200600743 2006   0.38
2004036 200403644 2004   0.32
2005030 200503044 2005   0.39
2006008 200600844 2006   0.46
2004039 200403947 2004   0.50
2005025 200502547 2005   0.55
2006009 200600947 2006   0.63
2004055 200405564 2004   0.45
2005029 200502964 2005   0.58
2006014 200601464 2006   0.67
2006015 2006015   D72 2006   0.30
2004093 2004093  S200 2004   0.68
2005040 2005040  S200 2005   0.74
2006016 2006016  S200 2006   0.84
2004094 2004094  S201 2004   0.46
2005041 2005041  S201 2005   0.49
2006017 2006017  S201 2006   0.53
2004095 2004095  S202 2004   0.17
2004096 2004096  S203 2004   0.23
2005032 2005032  S203 2005   0.23
2006019 2006019  S203 2006   0.23
2004097 2004097  S204 2004   0.25
2005031 2005031  S204 2005   0.29
2006020 2006020  S204 2006   0.41
2004098 2004098  S205 2004   0.22
2005039 2005039  S205 2005   0.26
2006021 2006021  S205 2006   0.37
2004099 2004099  S206 2004   0.19
2005035 2005035  S206 2005   0.25
2006022 2006022  S206 2006   0.37
2004100 2004100  S207 2004   0.29
2005003 2005003  S207 2005   0.36
2006023 2006023  S207 2006   0.41
2004101 2004101  S208 2004   0.17
2005005 2005005  S208 2005   0.20
2006024 2006024  S208 2006   0.16
2004102 2004102  S209 2004   0.16
2005008 2005008  S209 2005   0.19
2006025 2006025  S209 2006   0.24
2004103 2004103  S210 2004   0.09
2005007 2005007  S210 2005   0.14
2006026 2006026  S210 2006   0.15
2004104 2004104  S211 2004   0.12
2005006 2005006  S211 2005   0.12
2006027 2006027  S211 2006   0.22
2004105 2004105  S212 2004   0.61
2005011 2005011  S212 2005   0.71
2006028 2006028  S212 2006   0.81
2004106 2004106  S213 2004   0.28
2005010 2005010  S213 2005   0.37
2006029 2006029  S213 2006   0.44
2004107 2004107  S214 2004   0.47
2005009 2005009  S214 2005   0.59
2006030 2006030  S214 2006   0.67
2004108 2004108  S215 2004   0.43
2005004 2005004  S215 2005   0.53
2006031 2006031  S215 2006   0.66
2004109 2004109  S216 2004   0.35
2005019 2005019  S216 2005   0.38
2006032 2006032  S216 2006   0.41
2004110 2004110  S217 2004   0.20
2005018 2005018  S217 2005   0.21
2006033 2006033  S217 2006   0.32
2004111 2004111  S218 2004   0.19
2005014 2005014  S218 2005   0.21
2006034 2006034  S218 2006   0.27
2004112 2004112  S219 2004   0.21
2005034 2005034  S219 2005   0.24
2006035 2006035  S219 2006   0.24
2004113 2004113  S220 2004   0.19
2005021 2005021  S220 2005   0.19
2006036 2006036  S220 2006   0.25
2004114 2004114  S222 2004   0.34
2005020 2005020  S222 2005   0.35
2006037 2006037  S222 2006   0.46
2005013 2005013  S223 2005   0.04
2006038 2006038  S223 2006   0.04
2005012 2005012  S224 2005   0.13
2006039 2006039  S224 2006   0.14
-- 
NEW EMAIL ADDRESS AND ADDRESS:

[EMAIL PROTECTED]

[EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH

Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation
Biology (UCT)

Leslie Hill Institute for Plant Conservation
University of Cape Town
Rondebosch 7701
South Africa

Fax:+27 - (0)86 516 2782
Fax:+27 - (0)21 650 2440 (w)
Cell:

Re: [R] Identifying last record in individual growth data over different time intervalls

2007-03-05 Thread jim holtman
If you were worried about efficiency and the structure/size of the dataframe
was complex/big, then you could work with the indices only which would be
more efficient:

 sapply(split(seq(nrow(t)), t$plate), function(x) t$id[x][which.max
(t$year[x])])
 15  20  33  43  44  47  64 D72S200
S201S202S203S204
2006001 2006003 2006005 2006007 2006008 2006009 2006014 2006015 2006016
2006017 2004095 2006019 2006020
   S205S206S207S208S209S210S211S212S213
S214S215S216S217
2006021 2006022 2006023 2006024 2006025 2006026 2006027 2006028 2006029
2006030 2006031 2006032 2006033
   S218S219S220S222S223S224
2006034 2006035 2006036 2006037 2006038 2006039




On 3/5/07, Rainer M. Krug [EMAIL PROTECTED] wrote:

 Hi

 I have a plist t which contains size measurements of individual plants,
 identified by the field plate. It contains, among other, a field
 year indicating the year in which the individual was measured and the
 height. The number of measurements range from 1 to 4 measurements in
 different years.
 My problem is that I would need the LAST measurement. I only came up
 with the solution below which is probably way to complicated, but I
 can't think of another solution.

 Does anybody has an idea how to do this more effectively?

 Finally I would like to have a data.frame t2 which only contains the
 entries of the last measurements.

 Thanks in advance,

 Rainer



  unlist(
sapply(
split(t, t$plate),
function(i)
{
i[i$year==max(i$year),]$id
}
)
)

  15  20  33  43  44  47  64 D72S200
S201
 2006001 2006003 2006005 2006007 2006008 2006009 2006014 2006015 2006016
 2006017
S202S203S204S205S206S207S208S209S210
S211
 2004095 2006019 2006020 2006021 2006022 2006023 2006024 2006025 2006026
 2006027
S212S213S214S215S216S217S218S219S220
S222
 2006028 2006029 2006030 2006031 2006032 2006033 2006034 2006035 2006036
 2006037
S223S224S225S226S227S228S229S230S231
S232
 2006038 2006039 2006040 2006041 2006042 2006043 2006044 2006045 2006046
 2006047
 
  t
  id plate year height
 2004007 200400715 2004   0.40
 2005024 200502415 2005   0.43
 2006001 200600115 2006   0.44
 2004012 200401220 2004   0.90
 2005026 200502620 2005   0.94
 2006003 200600320 2006   0.98
 2004025 200402533 2004   0.15
 2005027 200502733 2005   0.15
 2006005 200600533 2006   0.16
 2004035 200403543 2004   0.26
 2005038 200503843 2005   0.30
 2006007 200600743 2006   0.38
 2004036 200403644 2004   0.32
 2005030 200503044 2005   0.39
 2006008 200600844 2006   0.46
 2004039 200403947 2004   0.50
 2005025 200502547 2005   0.55
 2006009 200600947 2006   0.63
 2004055 200405564 2004   0.45
 2005029 200502964 2005   0.58
 2006014 200601464 2006   0.67
 2006015 2006015   D72 2006   0.30
 2004093 2004093  S200 2004   0.68
 2005040 2005040  S200 2005   0.74
 2006016 2006016  S200 2006   0.84
 2004094 2004094  S201 2004   0.46
 2005041 2005041  S201 2005   0.49
 2006017 2006017  S201 2006   0.53
 2004095 2004095  S202 2004   0.17
 2004096 2004096  S203 2004   0.23
 2005032 2005032  S203 2005   0.23
 2006019 2006019  S203 2006   0.23
 2004097 2004097  S204 2004   0.25
 2005031 2005031  S204 2005   0.29
 2006020 2006020  S204 2006   0.41
 2004098 2004098  S205 2004   0.22
 2005039 2005039  S205 2005   0.26
 2006021 2006021  S205 2006   0.37
 2004099 2004099  S206 2004   0.19
 2005035 2005035  S206 2005   0.25
 2006022 2006022  S206 2006   0.37
 2004100 2004100  S207 2004   0.29
 2005003 2005003  S207 2005   0.36
 2006023 2006023  S207 2006   0.41
 2004101 2004101  S208 2004   0.17
 2005005 2005005  S208 2005   0.20
 2006024 2006024  S208 2006   0.16
 2004102 2004102  S209 2004   0.16
 2005008 2005008  S209 2005   0.19
 2006025 2006025  S209 2006   0.24
 2004103 2004103  S210 2004   0.09
 2005007 2005007  S210 2005   0.14
 2006026 2006026  S210 2006   0.15
 2004104 2004104  S211 2004   0.12
 2005006 2005006  S211 2005   0.12
 2006027 2006027  S211 2006   0.22
 2004105 2004105  S212 2004   0.61
 2005011 2005011  S212 2005   0.71
 2006028 2006028  S212 2006   0.81
 2004106 2004106  S213 2004   0.28
 2005010 2005010  S213 2005   0.37
 2006029 2006029  S213 2006   0.44
 2004107 2004107  S214 2004   0.47
 2005009 2005009  S214 2005   0.59
 2006030 2006030  S214 2006   0.67
 2004108 2004108  S215 2004   0.43
 2005004 2005004  S215 2005   0.53
 2006031 2006031  S215 2006   0.66
 2004109 2004109  S216 2004   0.35
 2005019 2005019  S216 2005   0.38
 2006032 2006032  S216 2006   0.41
 2004110 2004110  S217 2004   0.20
 2005018 2005018  S217 2005   0.21
 2006033 2006033  S217 2006   0.32
 2004111 2004111  

Re: [R] Identifying last record in individual growth data over different time intervalls

2007-03-05 Thread jim holtman
What is wrong with the method that you have?  It looks reasonable
efficient.  As with other languages, there are always other ways of doing
it.  Here is another to consider, but it is basically the same:

 sapply(split(t, t$plate), function(x) x$id[which.max(x$year)])
 15  20  33  43  44  47  64 D72S200
S201S202S203S204
2006001 2006003 2006005 2006007 2006008 2006009 2006014 2006015 2006016
2006017 2004095 2006019 2006020
   S205S206S207S208S209S210S211S212S213
S214S215S216S217
2006021 2006022 2006023 2006024 2006025 2006026 2006027 2006028 2006029
2006030 2006031 2006032 2006033
   S218S219S220S222S223S224
2006034 2006035 2006036 2006037 2006038 2006039




On 3/5/07, Rainer M. Krug [EMAIL PROTECTED] wrote:

 Hi

 I have a plist t which contains size measurements of individual plants,
 identified by the field plate. It contains, among other, a field
 year indicating the year in which the individual was measured and the
 height. The number of measurements range from 1 to 4 measurements in
 different years.
 My problem is that I would need the LAST measurement. I only came up
 with the solution below which is probably way to complicated, but I
 can't think of another solution.

 Does anybody has an idea how to do this more effectively?

 Finally I would like to have a data.frame t2 which only contains the
 entries of the last measurements.

 Thanks in advance,

 Rainer



  unlist(
sapply(
split(t, t$plate),
function(i)
{
i[i$year==max(i$year),]$id
}
)
)

  15  20  33  43  44  47  64 D72S200
S201
 2006001 2006003 2006005 2006007 2006008 2006009 2006014 2006015 2006016
 2006017
S202S203S204S205S206S207S208S209S210
S211
 2004095 2006019 2006020 2006021 2006022 2006023 2006024 2006025 2006026
 2006027
S212S213S214S215S216S217S218S219S220
S222
 2006028 2006029 2006030 2006031 2006032 2006033 2006034 2006035 2006036
 2006037
S223S224S225S226S227S228S229S230S231
S232
 2006038 2006039 2006040 2006041 2006042 2006043 2006044 2006045 2006046
 2006047
 
  t
  id plate year height
 2004007 200400715 2004   0.40
 2005024 200502415 2005   0.43
 2006001 200600115 2006   0.44
 2004012 200401220 2004   0.90
 2005026 200502620 2005   0.94
 2006003 200600320 2006   0.98
 2004025 200402533 2004   0.15
 2005027 200502733 2005   0.15
 2006005 200600533 2006   0.16
 2004035 200403543 2004   0.26
 2005038 200503843 2005   0.30
 2006007 200600743 2006   0.38
 2004036 200403644 2004   0.32
 2005030 200503044 2005   0.39
 2006008 200600844 2006   0.46
 2004039 200403947 2004   0.50
 2005025 200502547 2005   0.55
 2006009 200600947 2006   0.63
 2004055 200405564 2004   0.45
 2005029 200502964 2005   0.58
 2006014 200601464 2006   0.67
 2006015 2006015   D72 2006   0.30
 2004093 2004093  S200 2004   0.68
 2005040 2005040  S200 2005   0.74
 2006016 2006016  S200 2006   0.84
 2004094 2004094  S201 2004   0.46
 2005041 2005041  S201 2005   0.49
 2006017 2006017  S201 2006   0.53
 2004095 2004095  S202 2004   0.17
 2004096 2004096  S203 2004   0.23
 2005032 2005032  S203 2005   0.23
 2006019 2006019  S203 2006   0.23
 2004097 2004097  S204 2004   0.25
 2005031 2005031  S204 2005   0.29
 2006020 2006020  S204 2006   0.41
 2004098 2004098  S205 2004   0.22
 2005039 2005039  S205 2005   0.26
 2006021 2006021  S205 2006   0.37
 2004099 2004099  S206 2004   0.19
 2005035 2005035  S206 2005   0.25
 2006022 2006022  S206 2006   0.37
 2004100 2004100  S207 2004   0.29
 2005003 2005003  S207 2005   0.36
 2006023 2006023  S207 2006   0.41
 2004101 2004101  S208 2004   0.17
 2005005 2005005  S208 2005   0.20
 2006024 2006024  S208 2006   0.16
 2004102 2004102  S209 2004   0.16
 2005008 2005008  S209 2005   0.19
 2006025 2006025  S209 2006   0.24
 2004103 2004103  S210 2004   0.09
 2005007 2005007  S210 2005   0.14
 2006026 2006026  S210 2006   0.15
 2004104 2004104  S211 2004   0.12
 2005006 2005006  S211 2005   0.12
 2006027 2006027  S211 2006   0.22
 2004105 2004105  S212 2004   0.61
 2005011 2005011  S212 2005   0.71
 2006028 2006028  S212 2006   0.81
 2004106 2004106  S213 2004   0.28
 2005010 2005010  S213 2005   0.37
 2006029 2006029  S213 2006   0.44
 2004107 2004107  S214 2004   0.47
 2005009 2005009  S214 2005   0.59
 2006030 2006030  S214 2006   0.67
 2004108 2004108  S215 2004   0.43
 2005004 2005004  S215 2005   0.53
 2006031 2006031  S215 2006   0.66
 2004109 2004109  S216 2004   0.35
 2005019 2005019  S216 2005   0.38
 2006032 2006032  S216 2006   0.41
 2004110 2004110  S217 2004   0.20
 2005018 2005018  S217 2005   0.21
 2006033 2006033  S217 2006   0.32
 

Re: [R] Identifying last record in individual growth data over different time intervalls

2007-03-05 Thread Chris Stubben

 Finally I would like to have a data.frame t2 which only contains the 
 entries of the last measurements.
 

You could also use aggregate to get the max year per plate then join that back
to the original dataframe using merge on year and plate (common columns in both
dataframes).



x-data.frame(id=(1:8), plate=c(15,15,15,20,20,33,43,43),
year=c(2004,2005,2006,2004,2005,2004,2005,2006), 
height=c(0.40,0.43,0.44,0.90,0.94,0.15,0.30,0.38))

merge(x, aggregate(list(year=x$year), list(plate=x$plate), max))


  plate year id height
115 2006  3   0.44
220 2005  5   0.94
333 2004  6   0.15
443 2006  8   0.38

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Identifying last record in individual growth data over different time intervalls

2007-03-05 Thread Rainer M. Krug
Hi

jim holtman wrote:
 What is wrong with the method that you have?  It looks reasonable

Actually there is nothing wrong with the approach I am using - it just
seemed to be quite complicated and I assumed that there is an easier
approach around.

The dataset is not that large that I really have to worry about efficiency.

Thanks a lot ,

Rainer


 efficient.  As with other languages, there are always other ways of doing
 it.  Here is another to consider, but it is basically the same:
 
 sapply(split(t, t$plate), function(x) x$id[which.max(x$year)])
  15  20  33  43  44  47  64 D72S200
 S201S202S203S204
 2006001 2006003 2006005 2006007 2006008 2006009 2006014 2006015 2006016
 2006017 2004095 2006019 2006020
S205S206S207S208S209S210S211S212S213
 S214S215S216S217
 2006021 2006022 2006023 2006024 2006025 2006026 2006027 2006028 2006029
 2006030 2006031 2006032 2006033
S218S219S220S222S223S224
 2006034 2006035 2006036 2006037 2006038 2006039
 
 
 
 On 3/5/07, Rainer M. Krug [EMAIL PROTECTED] wrote:
 Hi

 I have a plist t which contains size measurements of individual plants,
 identified by the field plate. It contains, among other, a field
 year indicating the year in which the individual was measured and the
 height. The number of measurements range from 1 to 4 measurements in
 different years.
 My problem is that I would need the LAST measurement. I only came up
 with the solution below which is probably way to complicated, but I
 can't think of another solution.

 Does anybody has an idea how to do this more effectively?

 Finally I would like to have a data.frame t2 which only contains the
 entries of the last measurements.

 Thanks in advance,

 Rainer



-- 
NEW EMAIL ADDRESS AND ADDRESS:

[EMAIL PROTECTED]

[EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH

Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation
Biology (UCT)

Leslie Hill Institute for Plant Conservation
University of Cape Town
Rondebosch 7701
South Africa

Fax:+27 - (0)86 516 2782
Fax:+27 - (0)21 650 2440 (w)
Cell:   +27 - (0)83 9479 042

Skype:  RMkrug

email:  [EMAIL PROTECTED]
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Identifying last record in individual growth data over different time intervalls

2007-03-05 Thread Rainer M. Krug
Hi Chris

Chris Stubben wrote:
 Finally I would like to have a data.frame t2 which only contains the 
 entries of the last measurements.

 
 You could also use aggregate to get the max year per plate then join that back
 to the original dataframe using merge on year and plate (common columns in 
 both
 dataframes).
 

Thanks for the idea to use aggregate and merge - as I like SQL, this 
seems to be a nice approach.

Rainer

 
 
 x-data.frame(id=(1:8), plate=c(15,15,15,20,20,33,43,43),
 year=c(2004,2005,2006,2004,2005,2004,2005,2006), 
 height=c(0.40,0.43,0.44,0.90,0.94,0.15,0.30,0.38))
 
 merge(x, aggregate(list(year=x$year), list(plate=x$plate), max))
 
 
   plate year id height
 115 2006  3   0.44
 220 2005  5   0.94
 333 2004  6   0.15
 443 2006  8   0.38
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
NEW EMAIL ADDRESS AND ADDRESS:

[EMAIL PROTECTED]

[EMAIL PROTECTED] WILL BE DISCONTINUED END OF MARCH

Rainer M. Krug, Dipl. Phys. (Germany), MSc Conservation
Biology (UCT)

Leslie Hill Institute for Plant Conservation
University of Cape Town
Rondebosch 7701
South Africa

Fax:+27 - (0)86 516 2782
Fax:+27 - (0)21 650 2440 (w)
Cell:   +27 - (0)83 9479 042

Skype:  RMkrug

email:  [EMAIL PROTECTED]
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.