Re: [R] Fuzzy merge using timestamps
The sorting for the times is below: checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing = FALSE),] After the merging data frame is created I use these two commands to merge the checkTime data frame with the other two dataframes: data_gps <- merge(gpsdata,match, by= c("gpsARC","t_datetimegps"),all.x=TRUE) data_gpsur <- merge(data_gps,urdata, by = c("urARC","t_datetimeur"),all.x=TRUE) Then I create another data frame of just the variables I need and sort using this code: testdata <- data.frame(t_datetimegps=data_gpsur$t_datetimegps, gpsARC=data_gpsur$gpsARC, y=data_gpsur$y, x=data_gpsur$x, tot_pos=data_gpsur$TOT_POS) testdata <- testdata[order(testdata$gpsARC, t_datetimegps),] Thoughts? On Wed, Nov 10, 2010 at 2:09 PM, Sarah Goslee wrote: > Hi Ian, > > Did you see the second part of my request, for a simple > reproducible example? At the very least, we need your > merge and sort code, since it sounds like that's where > the problem truly lies. > > Sarah > > On Wed, Nov 10, 2010 at 2:02 PM, Ian Craig wrote: > > So here is a sample of the datasets. I have modified them for the > purposes > > of circulating to solve this problem. I have already added the ID and > key > > as seen in the code below. Any ideas? > > gpsdata > >gpsARC Protocol TrackUTCDate UTCTime LocalDate LocalTime > y > > x Altitude_m Speed_km_h TimeBef_sec Dist_Bef_m TimeAft_sec > DistAft_m > > t_datetimegps ID1 key1 > > 1 11825938-C 1 07/21/2009 21:29:05 07/21/2009 17:29:05 39.29393 > > -76.54961 0.547843 0.162998 0 0.00 36 > 27.25 > > 2009-07-21 17:09:00 10 > > 2 11825938-C 1 07/21/2009 21:29:41 07/21/2009 17:29:41 39.29397 > > -76.54930 6.615943 27.425990 36 27.25 4 > 32.53 > > 2009-07-21 17:19:00 20 > > 3 11825938-C 1 07/21/2009 21:29:45 07/21/2009 17:29:45 39.29383 > > -76.54897 6.357120 30.192910 4 32.53 3 > 26.39 > > 2009-07-21 17:29:00 30 > > 4 11825938-C 1 07/21/2009 21:29:48 07/21/2009 17:29:48 39.29363 > > -76.54881 6.134487 30.489670 3 26.39 5 > 27.27 > > 2009-07-21 17:39:00 40 > > 5 11825938-C 1 07/21/2009 21:29:53 07/21/2009 17:29:53 39.29341 > > -76.54868 6.451584 13.394810 5 27.27 10 > 27.47 > > 2009-07-21 17:49:00 50 > > 6 11825938-C 1 07/21/2009 21:30:03 07/21/2009 17:30:03 39.29352 > > -76.54839 4.324665 13.323760 10 27.47 5 > 26.25 > > 2009-07-22 18:00:00 60 > > 7 11825938-C 1 07/21/2009 21:30:08 07/21/2009 17:30:08 39.29370 > > -76.54819 3.662347 12.518040 5 26.25 8 > 30.68 > > 2009-07-22 18:20:00 70 > > 8 11825938-C 1 07/21/2009 21:30:16 07/21/2009 17:30:16 39.29394 > > -76.54837 2.378734 36.134190 8 30.68 3 > 31.95 > > 2009-07-22 18:30:00 80 > > 9 11825938-C 1 07/21/2009 21:30:19 07/21/2009 17:30:19 39.29408 > > -76.54870 2.222491 38.962510 3 31.95 3 > 31.20 > > 2009-07-22 18:40:00 90 > > 10 11825938-C 1 07/21/2009 21:30:22 07/21/2009 17:30:22 39.29419 > > -76.54903 2.024886 31.771920 3 31.20 6 > 25.91 > > 2009-07-22 18:50:00 100 > > 11 11825938-C 1 07/21/2009 21:30:28 07/21/2009 17:30:28 39.29430 > > -76.54929 2.831168 4.424288 6 25.91 51 > 25.10 > > 2009-07-22 19:00:00 110 > > 12 11825938-C 1 07/21/2009 21:31:19 07/21/2009 17:31:19 39.29432 > > -76.54958 1.545548 26.467430 51 25.10 3 > 26.97 > > 2009-07-22 19:01:00 120 > > 13 11825938-C 1 07/21/2009 21:31:22 07/21/2009 17:31:22 39.29415 > > -76.54980 2.062366 37.769800 3 26.97 3 > 34.60 > > 2009-07-22 19:05:00 130 > > 14 21727638-C 1 07/21/2009 21:31:25 07/21/2009 17:31:25 39.29388 > > -76.55000 2.120106 45.099640 3 34.60 2 > 25.38 > > 2009-07-21 17:11:00 140 > > 15 21727638-C 1 07/21/2009 21:31:27 07/21/2009 17:31:27 39.29367 > > -76.55014 2.107622 47.953690 2 25.38 2 > 26.39 > > 2009-07-21 17:21:00 150 > > 16 21727638-C 1 07/21/2009 21:31:29 07/21/2009 17:31:29 39.29347 > > -76.55029 1.942083 47.656870 2 26.39 2 > 26.72 > > 2009-07-21 17:31:00 160 > > 17 21727638-C 1 07/21/2009 21:31:31 07/21/2009 17:31:31 39.29326 > > -76.55045 2.023492 46.045600 2 26.72 3 > 36.03 > > 2009-07-21 17:41:00 170 > > 18 21727638-C 1 07/21/2009 21:31:34 07/21/2009 17:31:34 39.29298 > > -76.55066 2.211649 39.604580 3 36.03 3 > 27.07 > > 2009-07-21 17:51:00 180 > > 19 21727638-C 1 07/21/2009 21:31:37 07/21/2009 17:31:37 39.29277 > > -76.55083 1.479468 25.079980
Re: [R] Fuzzy merge using timestamps
Hi Ian, Did you see the second part of my request, for a simple reproducible example? At the very least, we need your merge and sort code, since it sounds like that's where the problem truly lies. Sarah On Wed, Nov 10, 2010 at 2:02 PM, Ian Craig wrote: > So here is a sample of the datasets. I have modified them for the purposes > of circulating to solve this problem. I have already added the ID and key > as seen in the code below. Any ideas? > gpsdata > gpsARC Protocol Track UTCDate UTCTime LocalDate LocalTime y > x Altitude_m Speed_km_h TimeBef_sec Dist_Bef_m TimeAft_sec DistAft_m > t_datetimegps ID1 key1 > 1 118259 38-C 1 07/21/2009 21:29:05 07/21/2009 17:29:05 39.29393 > -76.54961 0.547843 0.162998 0 0.00 36 27.25 > 2009-07-21 17:09:00 1 0 > 2 118259 38-C 1 07/21/2009 21:29:41 07/21/2009 17:29:41 39.29397 > -76.54930 6.615943 27.425990 36 27.25 4 32.53 > 2009-07-21 17:19:00 2 0 > 3 118259 38-C 1 07/21/2009 21:29:45 07/21/2009 17:29:45 39.29383 > -76.54897 6.357120 30.192910 4 32.53 3 26.39 > 2009-07-21 17:29:00 3 0 > 4 118259 38-C 1 07/21/2009 21:29:48 07/21/2009 17:29:48 39.29363 > -76.54881 6.134487 30.489670 3 26.39 5 27.27 > 2009-07-21 17:39:00 4 0 > 5 118259 38-C 1 07/21/2009 21:29:53 07/21/2009 17:29:53 39.29341 > -76.54868 6.451584 13.394810 5 27.27 10 27.47 > 2009-07-21 17:49:00 5 0 > 6 118259 38-C 1 07/21/2009 21:30:03 07/21/2009 17:30:03 39.29352 > -76.54839 4.324665 13.323760 10 27.47 5 26.25 > 2009-07-22 18:00:00 6 0 > 7 118259 38-C 1 07/21/2009 21:30:08 07/21/2009 17:30:08 39.29370 > -76.54819 3.662347 12.518040 5 26.25 8 30.68 > 2009-07-22 18:20:00 7 0 > 8 118259 38-C 1 07/21/2009 21:30:16 07/21/2009 17:30:16 39.29394 > -76.54837 2.378734 36.134190 8 30.68 3 31.95 > 2009-07-22 18:30:00 8 0 > 9 118259 38-C 1 07/21/2009 21:30:19 07/21/2009 17:30:19 39.29408 > -76.54870 2.222491 38.962510 3 31.95 3 31.20 > 2009-07-22 18:40:00 9 0 > 10 118259 38-C 1 07/21/2009 21:30:22 07/21/2009 17:30:22 39.29419 > -76.54903 2.024886 31.771920 3 31.20 6 25.91 > 2009-07-22 18:50:00 10 0 > 11 118259 38-C 1 07/21/2009 21:30:28 07/21/2009 17:30:28 39.29430 > -76.54929 2.831168 4.424288 6 25.91 51 25.10 > 2009-07-22 19:00:00 11 0 > 12 118259 38-C 1 07/21/2009 21:31:19 07/21/2009 17:31:19 39.29432 > -76.54958 1.545548 26.467430 51 25.10 3 26.97 > 2009-07-22 19:01:00 12 0 > 13 118259 38-C 1 07/21/2009 21:31:22 07/21/2009 17:31:22 39.29415 > -76.54980 2.062366 37.769800 3 26.97 3 34.60 > 2009-07-22 19:05:00 13 0 > 14 217276 38-C 1 07/21/2009 21:31:25 07/21/2009 17:31:25 39.29388 > -76.55000 2.120106 45.099640 3 34.60 2 25.38 > 2009-07-21 17:11:00 14 0 > 15 217276 38-C 1 07/21/2009 21:31:27 07/21/2009 17:31:27 39.29367 > -76.55014 2.107622 47.953690 2 25.38 2 26.39 > 2009-07-21 17:21:00 15 0 > 16 217276 38-C 1 07/21/2009 21:31:29 07/21/2009 17:31:29 39.29347 > -76.55029 1.942083 47.656870 2 26.39 2 26.72 > 2009-07-21 17:31:00 16 0 > 17 217276 38-C 1 07/21/2009 21:31:31 07/21/2009 17:31:31 39.29326 > -76.55045 2.023492 46.045600 2 26.72 3 36.03 > 2009-07-21 17:41:00 17 0 > 18 217276 38-C 1 07/21/2009 21:31:34 07/21/2009 17:31:34 39.29298 > -76.55066 2.211649 39.604580 3 36.03 3 27.07 > 2009-07-21 17:51:00 18 0 > 19 217276 38-C 1 07/21/2009 21:31:37 07/21/2009 17:31:37 39.29277 > -76.55083 1.479468 25.079980 3 27.07 40 29.64 > 2009-07-22 19:31:00 19 0 > 20 217276 38-C 1 07/21/2009 21:32:17 07/21/2009 17:32:17 39.29256 > -76.55103 2.041623 23.242470 40 29.64 4 26.38 > 2009-07-22 19:32:00 20 0 > 21 217276 38-C 1 07/21/2009 21:32:21 07/21/2009 17:32:21 39.29255 > -76.55134 2.122359 32.137710 4 26.38 3 34.66 > 2009-07-22 19:34:00 21 0 > 22 217276 38-C 1 07/21/2009 21:32:24 07/21/2009 17:32:24 39.29276 > -76.55164 0.776128 48.219300 3 34.66 2 28.80 > 2009-07-22 19:36:00 22 0 > 23 217276 38-C 1 07/21/2009 21:32:26 07/21/2009 17:32:26 39.29294 > -76.55187 -0.542242 53.586880 2 28.80 2 31.85 > 2009-07-22 19:38:00 23 0 > 24 217276 38-C 1 07/21/2009 21:32:28 07/21/2009 17:32:28 39.29316
Re: [R] Fuzzy merge using timestamps
So here is a sample of the datasets. I have modified them for the purposes of circulating to solve this problem. I have already added the ID and key as seen in the code below. Any ideas? gpsdata gpsARC Protocol TrackUTCDate UTCTime LocalDate LocalTimey x Altitude_m Speed_km_h TimeBef_sec Dist_Bef_m TimeAft_sec DistAft_m t_datetimegps ID1 key1 1 11825938-C 1 07/21/2009 21:29:05 07/21/2009 17:29:05 39.29393 -76.54961 0.547843 0.162998 0 0.00 36 27.25 2009-07-21 17:09:00 10 2 11825938-C 1 07/21/2009 21:29:41 07/21/2009 17:29:41 39.29397 -76.54930 6.615943 27.425990 36 27.25 4 32.53 2009-07-21 17:19:00 20 3 11825938-C 1 07/21/2009 21:29:45 07/21/2009 17:29:45 39.29383 -76.54897 6.357120 30.192910 4 32.53 3 26.39 2009-07-21 17:29:00 30 4 11825938-C 1 07/21/2009 21:29:48 07/21/2009 17:29:48 39.29363 -76.54881 6.134487 30.489670 3 26.39 5 27.27 2009-07-21 17:39:00 40 5 11825938-C 1 07/21/2009 21:29:53 07/21/2009 17:29:53 39.29341 -76.54868 6.451584 13.394810 5 27.27 10 27.47 2009-07-21 17:49:00 50 6 11825938-C 1 07/21/2009 21:30:03 07/21/2009 17:30:03 39.29352 -76.54839 4.324665 13.323760 10 27.47 5 26.25 2009-07-22 18:00:00 60 7 11825938-C 1 07/21/2009 21:30:08 07/21/2009 17:30:08 39.29370 -76.54819 3.662347 12.518040 5 26.25 8 30.68 2009-07-22 18:20:00 70 8 11825938-C 1 07/21/2009 21:30:16 07/21/2009 17:30:16 39.29394 -76.54837 2.378734 36.134190 8 30.68 3 31.95 2009-07-22 18:30:00 80 9 11825938-C 1 07/21/2009 21:30:19 07/21/2009 17:30:19 39.29408 -76.54870 2.222491 38.962510 3 31.95 3 31.20 2009-07-22 18:40:00 90 10 11825938-C 1 07/21/2009 21:30:22 07/21/2009 17:30:22 39.29419 -76.54903 2.024886 31.771920 3 31.20 6 25.91 2009-07-22 18:50:00 100 11 11825938-C 1 07/21/2009 21:30:28 07/21/2009 17:30:28 39.29430 -76.54929 2.831168 4.424288 6 25.91 51 25.10 2009-07-22 19:00:00 110 12 11825938-C 1 07/21/2009 21:31:19 07/21/2009 17:31:19 39.29432 -76.54958 1.545548 26.467430 51 25.10 3 26.97 2009-07-22 19:01:00 120 13 11825938-C 1 07/21/2009 21:31:22 07/21/2009 17:31:22 39.29415 -76.54980 2.062366 37.769800 3 26.97 3 34.60 2009-07-22 19:05:00 130 14 21727638-C 1 07/21/2009 21:31:25 07/21/2009 17:31:25 39.29388 -76.55000 2.120106 45.099640 3 34.60 2 25.38 2009-07-21 17:11:00 140 15 21727638-C 1 07/21/2009 21:31:27 07/21/2009 17:31:27 39.29367 -76.55014 2.107622 47.953690 2 25.38 2 26.39 2009-07-21 17:21:00 150 16 21727638-C 1 07/21/2009 21:31:29 07/21/2009 17:31:29 39.29347 -76.55029 1.942083 47.656870 2 26.39 2 26.72 2009-07-21 17:31:00 160 17 21727638-C 1 07/21/2009 21:31:31 07/21/2009 17:31:31 39.29326 -76.55045 2.023492 46.045600 2 26.72 3 36.03 2009-07-21 17:41:00 170 18 21727638-C 1 07/21/2009 21:31:34 07/21/2009 17:31:34 39.29298 -76.55066 2.211649 39.604580 3 36.03 3 27.07 2009-07-21 17:51:00 180 19 21727638-C 1 07/21/2009 21:31:37 07/21/2009 17:31:37 39.29277 -76.55083 1.479468 25.079980 3 27.07 40 29.64 2009-07-22 19:31:00 190 20 21727638-C 1 07/21/2009 21:32:17 07/21/2009 17:32:17 39.29256 -76.55103 2.041623 23.242470 40 29.64 4 26.38 2009-07-22 19:32:00 200 21 21727638-C 1 07/21/2009 21:32:21 07/21/2009 17:32:21 39.29255 -76.55134 2.122359 32.137710 4 26.38 3 34.66 2009-07-22 19:34:00 210 22 21727638-C 1 07/21/2009 21:32:24 07/21/2009 17:32:24 39.29276 -76.55164 0.776128 48.219300 3 34.66 2 28.80 2009-07-22 19:36:00 220 23 21727638-C 1 07/21/2009 21:32:26 07/21/2009 17:32:26 39.29294 -76.55187 -0.542242 53.586880 2 28.80 2 31.85 2009-07-22 19:38:00 230 24 21727638-C 1 07/21/2009 21:32:28 07/21/2009 17:32:28 39.29316 -76.55210 -0.383976 56.380370 2 31.85 2 32.48 2009-07-22 19:40:00 240 25 21727638-C 1 07/21/2009 21:32:30 07/21/2009 17:32:30 39.29344 -76.55223 -1.343364 58.055140 2 32.48 2 33.62 2009-07-22 19:41:00 250 26 21727638-C 1 07/21/2009 21:32:32 07/21/2009 17:32:32 39.29374 -76.55227 -2.229877 61.005420 2
Re: [R] Fuzzy merge using timestamps
On Wed, Nov 10, 2010 at 12:57 PM, Ian Craig wrote: > Greetings Supreme Council of R Masters, Nice. :) > I have two sets of data, each with a set of timestamps. I would like to > somehow merge the datasets based on the timestamps and an individual > identifier. That is there are several individuals all with timestamps, with > times that could overlap. By browsing through some of the older posts, I > got the idea to create a third data frame of both sets of timestamps, > individual identifiers, and a key to determine which dataset they have come > from, then find the breaks to determine which of each dataset should be > paired. the code I have written so far look something like this. This would be easier to sort through if you included a toy example with data so that we could try it. As it is, I have no idea what your data actually look like. > gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps) > urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur) > > gpsdata$ID1 <- row.names(gpsdata) > urdata$ID2 <- row.names(urdata) > > gpsdata$key1 <- rep(0, nrow(gpsdata)) > urdata$key2 <- rep(1, nrow(urdata)) > > checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2), > ARC=c(gpsdata$gpsARC, urdata$urARC), > times=c(gpsdata$t_datetimegps, urdata$t_datetimeur), > key=c(gpsdata$key1, urdata$key2)) > > checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing = > FALSE),] > > breaks <- which(diff(checkTime$key) == 1) > > match <- data.frame(ID1=checkTime$ID[breaks], > gpsARC = checkTime$ARC[breaks], > urARC = checkTime$ARC[breaks + 1], > t_datetimegps=checkTime$times[breaks], > t_datetimeur=checkTime$times[breaks + 1]) > > #Then I merge the 'match' data frame with the gpsdata data frame and the > product with the urdata data frame. The problem is that when I create the > checkTime data frame and sort it, it sorts the urdata portion first then the > gpsdata portion. So my key column looks like > 1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of > 0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key. > S.O.S Why is it doing this? Shouldn't it just order the timestamps of > both data frames together? So really this is a sorting problem, not a merging problem? Is the merging part working correctly? What exactly are you doing to merge? To sort? Here again a worked functional example would be really useful. Without knowing what you're doing, I can't offer suggestions. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.