Re: [R] Odp: Problem with aggregating data across time points
Thanks for all your help, that has worked a treat. To answer your questions, I want to include the zero rows because I am going to analyse using mixed models (with dummies for day of week, location etc.) and I thought it was necessary to include a complete list of time variables, but now I'm wondering if it is necessary. As for the empty rows, the database is generated automatically by the incidents reporting system and is a bit of a mess, so I want to make sure that the code doesn't stumble over such things. Thanks again all! On 2 Jul 2010, at 17:14, David Winsemius dwinsem...@comcast.net wrote: On Jul 2, 2010, at 11:55 AM, Petr PIKAL wrote: Hi did you try aggregate? aggregate(data[, 5:8],list(data$Date), sum, na.rm=T) Group.1 verbal self.harm violence_objects violence 1 0 000 2 01/04/07 251539 3 02/04/07 24 68 13 4 03/04/07 17130 10 aggregate(data[, 5:8],list(data$Location,data$Date), sum, na.rm=T) That address his A) request: Here is the application of aggregate to his B) request (I think): # Not e that Date is not of class Date but is rather a factor that includes as a level. aggregate(series[, 5:8],list(series$Date, series$Location), sum, na.rm=T) Group.1 Group.2 verbal self.harm violence_objects violence 10 000 2 A 0 000 3 01/04/07 A 7 103 4 02/04/07 A 8 201 5 03/04/07 A 0 002 6 B 0 000 7 01/04/07 B 3 201 8 02/04/07 B 4 200 9 03/04/07 B 4 003 10C 0 000 11 01/04/07 C 4 232 12 02/04/07 C 0 042 13 03/04/07 C 1 105 14D 0 000 15 01/04/07 D 7 603 16 02/04/07 D 0 009 17 03/04/07 D 41100 18E 0 000 19 01/04/07 E 4 300 20 02/04/07 E 4 040 21 03/04/07 E 8 100 22F 0 000 23 01/04/07 F 0 100 24 02/04/07 F 8 201 So perhaps an output with less extraneous input would be better: with(series[series$Date != , ], aggregate(list(verbal=verbal, self.harm=self.harm, viol_obj=violence_objects, violence=violence), list(Date, Location), sum, na.rm=T) ) Group.1 Group.2 verbal self.harm viol_obj violence 1 01/04/07 A 7 103 2 02/04/07 A 8 201 3 03/04/07 A 0 002 4 01/04/07 B 3 201 5 02/04/07 B 4 200 6 03/04/07 B 4 003 7 01/04/07 C 4 232 8 02/04/07 C 0 042 9 03/04/07 C 1 105 10 01/04/07 D 7 603 11 02/04/07 D 0 009 12 03/04/07 D 41100 13 01/04/07 E 4 300 14 02/04/07 E 4 040 15 03/04/07 E 8 100 16 01/04/07 F 0 100 17 02/04/07 F 8 201 BTW, why do you have empty rows? Regards Petr Hello- I have a dataset which basically looks like this: Location Sex Date Time VerbalSelf harm Violence_objects Violence A 1 1-4-2007 1800 3 0 1 3 A 1 1-4-2007 1230 21 2 4 D 2 2-4-2007 1100 04 0 0 ... I've put a dput of the first section of the data at the end of this email. Basically I have these data for several days across all of the dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on
[R] Odp: Problem with aggregating data across time points
Hi did you try aggregate? aggregate(data[, 5:8],list(data$Date), sum, na.rm=T) Group.1 verbal self.harm violence_objects violence 1 0 000 2 01/04/07 251539 3 02/04/07 24 68 13 4 03/04/07 17130 10 aggregate(data[, 5:8],list(data$Location,data$Date), sum, na.rm=T) BTW, why do you have empty rows? Regards Petr Hello- I have a dataset which basically looks like this: Location Sex Date Time VerbalSelf harm Violence_objects Violence A 1 1-4-2007 1800 3 0 1 3 A 1 1-4-2007 1230 21 2 4 D 2 2-4-2007 1100 04 0 0 ... I've put a dput of the first section of the data at the end of this email. Basically I have these data for several days across all of the dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on until 31-12-2009. The last four variables which you can see at the end of the email are my dependent variables, they are different types of violent and self harming behaviour shown by patients in a psychiatric hospital. What I want to do is: A) sum each of the dependent variables for each of the dates (so e.g. in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and 3+4=7 for each of the variables) B) do this sum, but only in each location this time (location is the first variable)- so the sum for 1-4-2007 in location A, sum for 1-4-2007 in location B, and so on and so on. Because this is divided across locations, some dates will have no data going into them and will return 0 sums. Crucially I still want these dates to appear- so e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2 0 0, then 23-5-2008 0 0 0 0 again, and etc. I've had several abortive attempts and done some Googling but have got nowhere. I'd greatly appreciate any advice. Many thanks, Chris Beeley (Institute of Mental Health, UK) structure(list(Location = structure(c(1L, 2L, 2L, 1L, 3L, 5L, 5L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 6L, 1L, 2L, 3L, 5L, 6L, 6L, 6L, 7L, 7L, 5L, 5L, 4L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 6L, 5L, 4L, 4L, 6L, 5L, 2L, 2L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 5L, 5L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c(, A, B, C, D, E, F), class = factor), Sex = c(NA, 1L, NA, NA, NA, 1L, 2L, NA, NA, 2L, 2L, NA, 2L, 2L, 1L, 1L, NA, 2L, 2L, 2L, 1L, NA, NA, 1L, 1L, 1L, 1L, 2L, 1L, 2L, NA, 1L, 1L, NA, 1L, NA, NA, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, 1L, 1L, NA, 2L, 2L, 1L, 2L, 1L ), Date = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 1L, 4L, 4L, 1L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 4L, 4L, 4L, 4L, 4L), .Label = c(, 01/04/07, 02/04/07, 03/04/07 ), class = factor), Time = structure(c(1L, 28L, 1L, 1L, 1L, 1L, 20L, 1L, 1L, 37L, 37L, 2L, 13L, 31L, 1L, 17L, 1L, 34L, 38L, 39L, 23L, 1L, 1L, 24L, 14L, 16L, 1L, 33L, 30L, 10L, 1L, 6L, 8L, 1L, 26L, 1L, 1L, 13L, 3L, 4L, 1L, 1L, 35L, 36L, 25L, 9L, 11L, 5L, 22L, 1L, 10L, 30L, 1L, 19L, 15L, 1L, 29L, 1L, 27L, 10L, 2L, 21L, 18L, 1L, 23L, 32L, 36L, 1L, 30L, 7L, 12L, 1L, 15L), .Label = c(, , 02:24:00, 03:44:00, 04:30:00, 07:00:00, 08:35:00, 09:20:00, 09:30:00, 10:00:00, 10:15:00, 10:45:00, 11:00:00, 11:20:00, 11:30:00, 11:35:00, 11:50:00, 12:00:00, 12:25:00, 12:30:00, 12:45:00, 15:00:00, 15:15:00, 15:30:00, 15:35:00, 17:15:00, 17:50:00, 18:00:00, 19:00:00, 19:30:00, 19:50:00, 20:00:00, 20:30:00, 20:55:00, 22:15:00, 22:30:00, 22:35:00, 22:40:00, 23:10:00 ), class = factor), verbal = c(NA, 3L, NA, NA, NA, 3L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 4L, NA, 0L, 0L, 0L, 4L, NA, NA, 4L, 3L, 0L, 4L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA, NA, 4L, 0L, 4L, 0L, 0L, 4L, 1L, 4L, 3L, 0L, 0L, 0L, NA, 4L, 0L, NA, 0L, 3L, NA, 1L, NA, 0L, 3L, NA, 1L, 4L, NA, 4L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 1L), self.harm = c(NA, 0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 1L, NA, 2L, 0L, 0L, 2L, NA, 2L, 0L, 2L, 0L, NA, NA, 0L, 0L, 2L, 0L, 1L, 2L, 1L, NA, 0L, 0L, NA, 0L, NA, NA, 0L, 2L, 0L, 1L, 1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, NA, 0L, 2L, NA, 0L, 0L, NA, 0L, NA, 4L, 0L, NA, 1L, 0L, NA, 1L, 3L, 1L, NA, 0L, 0L, 0L, 1L, 0L), violence_objects = c(NA, 0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 3L, NA, 0L, 0L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, 0L,
Re: [R] Odp: Problem with aggregating data across time points
On Jul 2, 2010, at 11:55 AM, Petr PIKAL wrote: Hi did you try aggregate? aggregate(data[, 5:8],list(data$Date), sum, na.rm=T) Group.1 verbal self.harm violence_objects violence 1 0 000 2 01/04/07 251539 3 02/04/07 24 68 13 4 03/04/07 17130 10 aggregate(data[, 5:8],list(data$Location,data$Date), sum, na.rm=T) That address his A) request: Here is the application of aggregate to his B) request (I think): # Not e that Date is not of class Date but is rather a factor that includes as a level. aggregate(series[, 5:8],list(series$Date, series$Location), sum, na.rm=T) Group.1 Group.2 verbal self.harm violence_objects violence 10 000 2 A 0 000 3 01/04/07 A 7 103 4 02/04/07 A 8 201 5 03/04/07 A 0 002 6 B 0 000 7 01/04/07 B 3 201 8 02/04/07 B 4 200 9 03/04/07 B 4 003 10C 0 000 11 01/04/07 C 4 232 12 02/04/07 C 0 042 13 03/04/07 C 1 105 14D 0 000 15 01/04/07 D 7 603 16 02/04/07 D 0 009 17 03/04/07 D 41100 18E 0 000 19 01/04/07 E 4 300 20 02/04/07 E 4 040 21 03/04/07 E 8 100 22F 0 000 23 01/04/07 F 0 100 24 02/04/07 F 8 201 So perhaps an output with less extraneous input would be better: with(series[series$Date != , ], aggregate(list(verbal=verbal, self.harm=self.harm, viol_obj=violence_objects, violence=violence), list(Date, Location), sum, na.rm=T) ) Group.1 Group.2 verbal self.harm viol_obj violence 1 01/04/07 A 7 103 2 02/04/07 A 8 201 3 03/04/07 A 0 002 4 01/04/07 B 3 201 5 02/04/07 B 4 200 6 03/04/07 B 4 003 7 01/04/07 C 4 232 8 02/04/07 C 0 042 9 03/04/07 C 1 105 10 01/04/07 D 7 603 11 02/04/07 D 0 009 12 03/04/07 D 41100 13 01/04/07 E 4 300 14 02/04/07 E 4 040 15 03/04/07 E 8 100 16 01/04/07 F 0 100 17 02/04/07 F 8 201 BTW, why do you have empty rows? Regards Petr Hello- I have a dataset which basically looks like this: Location Sex Date Time VerbalSelf harm Violence_objects Violence A 1 1-4-2007 1800 3 0 1 3 A 1 1-4-2007 1230 21 2 4 D 2 2-4-2007 1100 04 0 0 ... I've put a dput of the first section of the data at the end of this email. Basically I have these data for several days across all of the dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on until 31-12-2009. The last four variables which you can see at the end of the email are my dependent variables, they are different types of violent and self harming behaviour shown by patients in a psychiatric hospital. What I want to do is: A) sum each of the dependent variables for each of the dates (so e.g. in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and 3+4=7 for each of the variables) B) do this sum, but only in each location this time (location is the first variable)- so the sum for 1-4-2007 in location A, sum for 1-4-2007 in location B, and so on and so on. Because this is divided across locations, some dates will have no data going into them and