Re: [R] Odp: Problem with aggregating data across time points

2010-07-03 Thread Chris Beeley
Thanks for all your help, that has worked a treat. To answer your questions, I 
want to include the zero rows because I am going to analyse using mixed models 
(with dummies for day of week, location etc.) and I thought it was necessary to 
include a complete list of time variables, but now I'm wondering if it is 
necessary. As for the empty rows, the database is generated automatically by 
the incidents reporting system and is a bit of a mess, so I want to make sure 
that the code doesn't stumble over such things. 

Thanks again all!



On 2 Jul 2010, at 17:14, David Winsemius dwinsem...@comcast.net wrote:

 
 On Jul 2, 2010, at 11:55 AM, Petr PIKAL wrote:
 
 Hi
 
 did you try aggregate?
 
 aggregate(data[, 5:8],list(data$Date), sum, na.rm=T)
  Group.1 verbal self.harm violence_objects violence
 1   0 000
 2 01/04/07 251539
 3 02/04/07 24 68   13
 4 03/04/07 17130   10
 aggregate(data[, 5:8],list(data$Location,data$Date), sum, na.rm=T)
 
 That address his A) request:
 
 Here is the application of aggregate to his B) request (I think):
 
 # Not e that Date is not of class Date but is rather a factor that includes 
  as a level.
 
  aggregate(series[, 5:8],list(series$Date, series$Location), sum, na.rm=T)
Group.1 Group.2 verbal self.harm violence_objects violence
 10 000
 2 A  0 000
 3  01/04/07   A  7 103
 4  02/04/07   A  8 201
 5  03/04/07   A  0 002
 6 B  0 000
 7  01/04/07   B  3 201
 8  02/04/07   B  4 200
 9  03/04/07   B  4 003
 10C  0 000
 11 01/04/07   C  4 232
 12 02/04/07   C  0 042
 13 03/04/07   C  1 105
 14D  0 000
 15 01/04/07   D  7 603
 16 02/04/07   D  0 009
 17 03/04/07   D  41100
 18E  0 000
 19 01/04/07   E  4 300
 20 02/04/07   E  4 040
 21 03/04/07   E  8 100
 22F  0 000
 23 01/04/07   F  0 100
 24 02/04/07   F  8 201
 
 So perhaps an output with less extraneous input would be better:
 
  with(series[series$Date != , ],
aggregate(list(verbal=verbal, self.harm=self.harm, 
 viol_obj=violence_objects, violence=violence),
  list(Date, Location),
  sum, na.rm=T)
   )
 
Group.1 Group.2 verbal self.harm viol_obj violence
 1  01/04/07   A  7 103
 2  02/04/07   A  8 201
 3  03/04/07   A  0 002
 4  01/04/07   B  3 201
 5  02/04/07   B  4 200
 6  03/04/07   B  4 003
 7  01/04/07   C  4 232
 8  02/04/07   C  0 042
 9  03/04/07   C  1 105
 10 01/04/07   D  7 603
 11 02/04/07   D  0 009
 12 03/04/07   D  41100
 13 01/04/07   E  4 300
 14 02/04/07   E  4 040
 15 03/04/07   E  8 100
 16 01/04/07   F  0 100
 17 02/04/07   F  8 201
 
 BTW, why do you have empty rows?
 
 Regards
 Petr
 
 
 
 
 Hello-
 
 I have a dataset which basically looks like this:
 
 Location   Sex   Date  Time   VerbalSelf harm
 Violence_objects   Violence
 A 1  1-4-2007   1800  3 0
   1   3
 A 1  1-4-2007   1230  21
  2   4
 D 2  2-4-2007   1100  04
  0   0
 ...
 
 I've put a dput of the first section of the data at the end of this
 email. Basically I have these data for several days across all of the
 dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on
 

[R] Odp: Problem with aggregating data across time points

2010-07-02 Thread Petr PIKAL
Hi

did you try aggregate?

 aggregate(data[, 5:8],list(data$Date), sum, na.rm=T)
   Group.1 verbal self.harm violence_objects violence
1   0 000
2 01/04/07 251539
3 02/04/07 24 68   13
4 03/04/07 17130   10
 aggregate(data[, 5:8],list(data$Location,data$Date), sum, na.rm=T)

BTW, why do you have empty rows?

Regards
Petr




 Hello-
 
 I have a dataset which basically looks like this:
 
 Location   Sex   Date  Time   VerbalSelf harm
 Violence_objects   Violence
   A 1  1-4-2007   1800  3 0
 1   3
   A 1  1-4-2007   1230  21
2   4
   D 2  2-4-2007   1100  04
0   0
 ...
 
 I've put a dput of the first section of the data at the end of this
 email. Basically I have these data for several days across all of the
 dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on
 until 31-12-2009. The last four variables which you can see at the end
 of the email are my dependent variables, they are different types of
 violent and self harming behaviour shown by patients in a psychiatric
 hospital.
 
 What I want to do is:
 
 A) sum each of the dependent variables for each of the dates (so e.g.
 in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and
 3+4=7 for each of the variables)
 
 B) do this sum, but only in each location this time (location is the
 first variable)- so the sum for 1-4-2007 in location A, sum for
 1-4-2007 in location B, and so on and so on. Because this is divided
 across locations, some dates will have no data going into them and
 will return 0 sums. Crucially I still want these dates to appear- so
 e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2
 0 0, then 23-5-2008 0 0 0 0 again, and etc.
 
 I've had several abortive attempts and done some Googling but have got
 nowhere. I'd greatly appreciate any advice.
 
 Many thanks,
 Chris Beeley
 (Institute of Mental Health, UK)
 
 
 structure(list(Location = structure(c(1L, 2L, 2L, 1L, 3L, 5L,
 5L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 5L, 5L, 5L, 5L, 6L,
 1L, 2L, 3L, 5L, 6L, 6L, 6L, 7L, 7L, 5L, 5L, 4L, 4L, 4L, 3L, 3L,
 3L, 2L, 2L, 2L, 2L, 7L, 7L, 7L, 6L, 5L, 4L, 4L, 6L, 5L, 2L, 2L,
 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 5L, 5L, 3L, 3L, 4L,
 4L, 4L, 4L), .Label = c(, A, B, C, D, E, F), class = 
factor),
 Sex = c(NA, 1L, NA, NA, NA, 1L, 2L, NA, NA, 2L, 2L, NA, 2L,
 2L, 1L, 1L, NA, 2L, 2L, 2L, 1L, NA, NA, 1L, 1L, 1L, 1L, 2L,
 1L, 2L, NA, 1L, 1L, NA, 1L, NA, NA, 2L, 1L, 1L, 2L, 2L, 2L,
 2L, 1L, 2L, 2L, 2L, 2L, NA, 1L, 2L, NA, 1L, 1L, NA, 1L, NA,
 1L, 2L, NA, 1L, 1L, NA, 1L, 1L, 1L, NA, 2L, 2L, 1L, 2L, 1L
 ), Date = structure(c(1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L,
 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L,
 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 1L, 4L,
 4L, 1L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 4L, 4L, 4L, 1L, 4L,
 4L, 4L, 4L, 4L), .Label = c(, 01/04/07, 02/04/07, 03/04/07
 ), class = factor), Time = structure(c(1L, 28L, 1L, 1L,
 1L, 1L, 20L, 1L, 1L, 37L, 37L, 2L, 13L, 31L, 1L, 17L, 1L,
 34L, 38L, 39L, 23L, 1L, 1L, 24L, 14L, 16L, 1L, 33L, 30L,
 10L, 1L, 6L, 8L, 1L, 26L, 1L, 1L, 13L, 3L, 4L, 1L, 1L, 35L,
 36L, 25L, 9L, 11L, 5L, 22L, 1L, 10L, 30L, 1L, 19L, 15L, 1L,
 29L, 1L, 27L, 10L, 2L, 21L, 18L, 1L, 23L, 32L, 36L, 1L, 30L,
 7L, 12L, 1L, 15L), .Label = c(,  , 02:24:00, 03:44:00,
 04:30:00, 07:00:00, 08:35:00, 09:20:00, 09:30:00,
 10:00:00, 10:15:00, 10:45:00, 11:00:00, 11:20:00,
 11:30:00, 11:35:00, 11:50:00, 12:00:00, 12:25:00,
 12:30:00, 12:45:00, 15:00:00, 15:15:00, 15:30:00,
 15:35:00, 17:15:00, 17:50:00, 18:00:00, 19:00:00,
 19:30:00, 19:50:00, 20:00:00, 20:30:00, 20:55:00,
 22:15:00, 22:30:00, 22:35:00, 22:40:00, 23:10:00
 ), class = factor), verbal = c(NA, 3L, NA, NA, NA, 3L,
 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 4L, NA, 0L, 0L, 0L, 4L,
 NA, NA, 4L, 3L, 0L, 4L, 0L, 0L, 0L, NA, 0L, 0L, NA, 0L, NA,
 NA, 4L, 0L, 4L, 0L, 0L, 4L, 1L, 4L, 3L, 0L, 0L, 0L, NA, 4L,
 0L, NA, 0L, 3L, NA, 1L, NA, 0L, 3L, NA, 1L, 4L, NA, 4L, 0L,
 0L, NA, 0L, 0L, 0L, 0L, 1L), self.harm = c(NA, 0L, NA, NA,
 NA, 0L, 0L, NA, NA, 0L, 1L, NA, 2L, 0L, 0L, 2L, NA, 2L, 0L,
 2L, 0L, NA, NA, 0L, 0L, 2L, 0L, 1L, 2L, 1L, NA, 0L, 0L, NA,
 0L, NA, NA, 0L, 2L, 0L, 1L, 1L, 0L, 2L, 0L, 0L, 0L, 0L, 0L,
 NA, 0L, 2L, NA, 0L, 0L, NA, 0L, NA, 4L, 0L, NA, 1L, 0L, NA,
 1L, 3L, 1L, NA, 0L, 0L, 0L, 1L, 0L), violence_objects = c(NA,
 0L, NA, NA, NA, 0L, 0L, NA, NA, 0L, 0L, NA, 0L, 0L, 0L, 3L,
 NA, 0L, 0L, 0L, 0L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA,
 0L, 

Re: [R] Odp: Problem with aggregating data across time points

2010-07-02 Thread David Winsemius


On Jul 2, 2010, at 11:55 AM, Petr PIKAL wrote:


Hi

did you try aggregate?


aggregate(data[, 5:8],list(data$Date), sum, na.rm=T)

  Group.1 verbal self.harm violence_objects violence
1   0 000
2 01/04/07 251539
3 02/04/07 24 68   13
4 03/04/07 17130   10

aggregate(data[, 5:8],list(data$Location,data$Date), sum, na.rm=T)


That address his A) request:

Here is the application of aggregate to his B) request (I think):

# Not e that Date is not of class Date but is rather a factor that  
includes  as a level.


 aggregate(series[, 5:8],list(series$Date, series$Location), sum,  
na.rm=T)

Group.1 Group.2 verbal self.harm violence_objects violence
10 000
2 A  0 000
3  01/04/07   A  7 103
4  02/04/07   A  8 201
5  03/04/07   A  0 002
6 B  0 000
7  01/04/07   B  3 201
8  02/04/07   B  4 200
9  03/04/07   B  4 003
10C  0 000
11 01/04/07   C  4 232
12 02/04/07   C  0 042
13 03/04/07   C  1 105
14D  0 000
15 01/04/07   D  7 603
16 02/04/07   D  0 009
17 03/04/07   D  41100
18E  0 000
19 01/04/07   E  4 300
20 02/04/07   E  4 040
21 03/04/07   E  8 100
22F  0 000
23 01/04/07   F  0 100
24 02/04/07   F  8 201

So perhaps an output with less extraneous input would be better:

 with(series[series$Date != , ],
aggregate(list(verbal=verbal, self.harm=self.harm,  
viol_obj=violence_objects, violence=violence),

  list(Date, Location),
  sum, na.rm=T)
   )

Group.1 Group.2 verbal self.harm viol_obj violence
1  01/04/07   A  7 103
2  02/04/07   A  8 201
3  03/04/07   A  0 002
4  01/04/07   B  3 201
5  02/04/07   B  4 200
6  03/04/07   B  4 003
7  01/04/07   C  4 232
8  02/04/07   C  0 042
9  03/04/07   C  1 105
10 01/04/07   D  7 603
11 02/04/07   D  0 009
12 03/04/07   D  41100
13 01/04/07   E  4 300
14 02/04/07   E  4 040
15 03/04/07   E  8 100
16 01/04/07   F  0 100
17 02/04/07   F  8 201


BTW, why do you have empty rows?

Regards
Petr





Hello-

I have a dataset which basically looks like this:

Location   Sex   Date  Time   VerbalSelf harm
Violence_objects   Violence
 A 1  1-4-2007   1800  3 0
   1   3
 A 1  1-4-2007   1230  21
  2   4
 D 2  2-4-2007   1100  04
  0   0
...

I've put a dput of the first section of the data at the end of this
email. Basically I have these data for several days across all of the
dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on
until 31-12-2009. The last four variables which you can see at the  
end

of the email are my dependent variables, they are different types of
violent and self harming behaviour shown by patients in a psychiatric
hospital.

What I want to do is:

A) sum each of the dependent variables for each of the dates (so e.g.
in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3,  
and

3+4=7 for each of the variables)

B) do this sum, but only in each location this time (location is the
first variable)- so the sum for 1-4-2007 in location A, sum for
1-4-2007 in location B, and so on and so on. Because this is divided
across locations, some dates will have no data going into them and