Re: [R] How to remove rows representing concurrent sessions from data.frame?

2012-01-26 Thread Jean V Adams
johannes rara wrote on 01/26/2012 02:46:57 AM:

> I have a dataset like this (dput for this below) which represents user
> computer sessions:
> 
>username  machine   start end
> 1 user1 D5599.domain.com 2011-01-03 09:44:18 2011-01-03 09:47:27
> 2 user1 D5599.domain.com 2011-01-03 09:46:29 2011-01-03 10:09:16
> 3 user1 D5599.domain.com 2011-01-03 14:07:36 2011-01-03 14:56:17
> 4 user1 D5599.domain.com 2011-01-05 15:03:17 2011-01-05 15:23:15
> 5 user1 D5599.domain.com 2011-02-14 14:33:39 2011-02-14 14:40:16
> 6 user1 D5599.domain.com 2011-02-23 13:54:30 2011-02-23 13:58:23
> 7 user1 D5599.domain.com 2011-03-21 10:10:18 2011-03-21 10:32:22
> 8 user1 D5645.domain.com 2011-06-09 10:12:41 2011-06-09 10:58:59
> 9 user1 D5682.domain.com 2011-01-03 12:03:45 2011-01-03 12:29:43
> 10USER2 D5682.domain.com 2011-01-12 14:26:05 2011-01-12 14:32:53
> 11USER2 D5682.domain.com 2011-01-17 15:06:19 2011-01-17 15:44:22
> 12USER2 D5682.domain.com 2011-01-18 15:07:30 2011-01-18 15:42:43
> 13USER2 D5682.domain.com 2011-01-25 15:20:55 2011-01-25 15:24:38
> 14USER2 D5682.domain.com 2011-02-14 15:03:00 2011-02-14 15:07:43
> 15USER2 D5682.domain.com 2011-02-14 14:59:23 2011-02-14 15:14:47
> >
> 
> There may be serveral concurrent sessions for same username from the
> same computer. How can I remove those rows so that only one sessions
> is left for this data? I have no idea how to do this, maybe using
> difftime?
> 
> -J
> 
> structure(list(username = c("user1", "user1", "user1",
> "user1", "user1", "user1", "user1", "user1",
> "user1", "USER2", "USER2", "USER2", "USER2", "USER2", "USER2"
> ), machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("D5599.domain.com", 
"D5645.domain.com",
> "D5682.domain.com", "D5686.domain.com", "D5694.domain.com", 
> "D5696.domain.com",
> "D5772.domain.com", "D5772.domain.com", "D5847.domain.com", 
> "D5855.domain.com",
> "D5871.domain.com", "D5927.domain.com", "D5927.domain.com", 
> "D5952.domain.com",
> "D5993.domain.com", "D6012.domain.com", "D6048.domain.com", 
> "D6077.domain.com",
> "D5688.domain.com", "D5815.domain.com", "D6106.domain.com", 
"D6128.domain.com"
> ), class = "factor"), start = structure(c(1294040658, 1294040789,
> 1294056456, 1294232597, 1297686819, 1298462070, 1300695018, 1307603561,
> 1294049025, 1294835165, 1295269579, 1295356050, 1295961655, 1297688580,
> 1297688363), class = c("POSIXct", "POSIXt"), tzone = ""), end =
> structure(c(1294040847,
> 1294042156, 1294059377, 1294233795, 1297687216, 1298462303, 1300696342,
> 1307606339, 1294050583, 1294835573, 1295271862, 1295358163, 1295961878,
> 1297688863, 1297689287), class = c("POSIXct", "POSIXt"), tzone = "")),
> .Names = c("username",
> "machine", "start", "end"), row.names = c(NA, 15L), class = 
"data.frame")


# rearrange the data, so that there is one "date/time" variable
# and another variable indicates start/end
library(reshape)
df2 <- melt(df)
# sort the data by user, machine, date/time
df3 <- df2[order(df2$username, df2$machine, df2$value), ]
# for each user and machine, 
# keep only the first "start" record and the last "end" record
first <- function(x) {
l <- length(x)
c(1, 1-(x[-1]==x[-l]))
}
last <- function(x) {
y <- rev(x)
l <- length(y)
rev(c(1, 1-(y[-1]==y[-l])))
}
df4 <- df3[(df3$variable=="start" & first(df3$variable)) | 
(df3$variable=="end" & last(df3$variable)), ]
# combine the results
df5 <- cbind(df4[df4$variable=="start", 
c("username", "machine", "value")], 
value2=df4$value[df4$variable=="end"])
df5

Jean
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to remove rows representing concurrent sessions from data.frame?

2012-01-26 Thread johannes rara
I have a dataset like this (dput for this below) which represents user
computer sessions:

   username  machine   start end
1 user1 D5599.domain.com 2011-01-03 09:44:18 2011-01-03 09:47:27
2 user1 D5599.domain.com 2011-01-03 09:46:29 2011-01-03 10:09:16
3 user1 D5599.domain.com 2011-01-03 14:07:36 2011-01-03 14:56:17
4 user1 D5599.domain.com 2011-01-05 15:03:17 2011-01-05 15:23:15
5 user1 D5599.domain.com 2011-02-14 14:33:39 2011-02-14 14:40:16
6 user1 D5599.domain.com 2011-02-23 13:54:30 2011-02-23 13:58:23
7 user1 D5599.domain.com 2011-03-21 10:10:18 2011-03-21 10:32:22
8 user1 D5645.domain.com 2011-06-09 10:12:41 2011-06-09 10:58:59
9 user1 D5682.domain.com 2011-01-03 12:03:45 2011-01-03 12:29:43
10USER2 D5682.domain.com 2011-01-12 14:26:05 2011-01-12 14:32:53
11USER2 D5682.domain.com 2011-01-17 15:06:19 2011-01-17 15:44:22
12USER2 D5682.domain.com 2011-01-18 15:07:30 2011-01-18 15:42:43
13USER2 D5682.domain.com 2011-01-25 15:20:55 2011-01-25 15:24:38
14USER2 D5682.domain.com 2011-02-14 15:03:00 2011-02-14 15:07:43
15USER2 D5682.domain.com 2011-02-14 14:59:23 2011-02-14 15:14:47
>

There may be serveral concurrent sessions for same username from the
same computer. How can I remove those rows so that only one sessions
is left for this data? I have no idea how to do this, maybe using
difftime?

-J

structure(list(username = c("user1", "user1", "user1",
"user1", "user1", "user1", "user1", "user1",
"user1", "USER2", "USER2", "USER2", "USER2", "USER2", "USER2"
), machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("D5599.domain.com", "D5645.domain.com",
"D5682.domain.com", "D5686.domain.com", "D5694.domain.com", "D5696.domain.com",
"D5772.domain.com", "D5772.domain.com", "D5847.domain.com", "D5855.domain.com",
"D5871.domain.com", "D5927.domain.com", "D5927.domain.com", "D5952.domain.com",
"D5993.domain.com", "D6012.domain.com", "D6048.domain.com", "D6077.domain.com",
"D5688.domain.com", "D5815.domain.com", "D6106.domain.com", "D6128.domain.com"
), class = "factor"), start = structure(c(1294040658, 1294040789,
1294056456, 1294232597, 1297686819, 1298462070, 1300695018, 1307603561,
1294049025, 1294835165, 1295269579, 1295356050, 1295961655, 1297688580,
1297688363), class = c("POSIXct", "POSIXt"), tzone = ""), end =
structure(c(1294040847,
1294042156, 1294059377, 1294233795, 1297687216, 1298462303, 1300696342,
1307606339, 1294050583, 1294835573, 1295271862, 1295358163, 1295961878,
1297688863, 1297689287), class = c("POSIXct", "POSIXt"), tzone = "")),
.Names = c("username",
"machine", "start", "end"), row.names = c(NA, 15L), class = "data.frame")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.