Re: [R] Fuzzy merge using timestamps

2010-11-10 Thread Ian Craig
The sorting for the times is below:
 checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing
= FALSE),]

After the merging data frame is created I use these two commands to merge
the checkTime data frame with the other two dataframes:

data_gps <- merge(gpsdata,match, by= c("gpsARC","t_datetimegps"),all.x=TRUE)
data_gpsur <- merge(data_gps,urdata, by =
c("urARC","t_datetimeur"),all.x=TRUE)

Then I create another data frame of just the variables I need and sort using
this code:

testdata <- data.frame(t_datetimegps=data_gpsur$t_datetimegps,
gpsARC=data_gpsur$gpsARC, y=data_gpsur$y, x=data_gpsur$x,
tot_pos=data_gpsur$TOT_POS)
testdata <- testdata[order(testdata$gpsARC, t_datetimegps),]

Thoughts?

On Wed, Nov 10, 2010 at 2:09 PM, Sarah Goslee wrote:

> Hi Ian,
>
> Did you see the second part of my request, for a simple
> reproducible example? At the very least, we need your
> merge and sort code, since it sounds like that's where
> the problem truly lies.
>
> Sarah
>
> On Wed, Nov 10, 2010 at 2:02 PM, Ian Craig  wrote:
> > So here is a sample of the datasets.  I have modified them for the
> purposes
> > of circulating to solve this problem.  I have already added the ID and
> key
> > as seen in the code below.  Any ideas?
> > gpsdata
> >gpsARC Protocol TrackUTCDate  UTCTime  LocalDate LocalTime
>  y
> >   x Altitude_m Speed_km_h TimeBef_sec Dist_Bef_m TimeAft_sec
> DistAft_m
> > t_datetimegps ID1 key1
> > 1  11825938-C 1 07/21/2009 21:29:05 07/21/2009  17:29:05 39.29393
> > -76.54961   0.547843   0.162998   0   0.00  36
> 27.25
> > 2009-07-21 17:09:00   10
> > 2  11825938-C 1 07/21/2009 21:29:41 07/21/2009  17:29:41 39.29397
> > -76.54930   6.615943  27.425990  36  27.25   4
> 32.53
> > 2009-07-21 17:19:00   20
> > 3  11825938-C 1 07/21/2009 21:29:45 07/21/2009  17:29:45 39.29383
> > -76.54897   6.357120  30.192910   4  32.53   3
> 26.39
> > 2009-07-21 17:29:00   30
> > 4  11825938-C 1 07/21/2009 21:29:48 07/21/2009  17:29:48 39.29363
> > -76.54881   6.134487  30.489670   3  26.39   5
> 27.27
> > 2009-07-21 17:39:00   40
> > 5  11825938-C 1 07/21/2009 21:29:53 07/21/2009  17:29:53 39.29341
> > -76.54868   6.451584  13.394810   5  27.27  10
> 27.47
> > 2009-07-21 17:49:00   50
> > 6  11825938-C 1 07/21/2009 21:30:03 07/21/2009  17:30:03 39.29352
> > -76.54839   4.324665  13.323760  10  27.47   5
> 26.25
> > 2009-07-22 18:00:00   60
> > 7  11825938-C 1 07/21/2009 21:30:08 07/21/2009  17:30:08 39.29370
> > -76.54819   3.662347  12.518040   5  26.25   8
> 30.68
> > 2009-07-22 18:20:00   70
> > 8  11825938-C 1 07/21/2009 21:30:16 07/21/2009  17:30:16 39.29394
> > -76.54837   2.378734  36.134190   8  30.68   3
> 31.95
> > 2009-07-22 18:30:00   80
> > 9  11825938-C 1 07/21/2009 21:30:19 07/21/2009  17:30:19 39.29408
> > -76.54870   2.222491  38.962510   3  31.95   3
> 31.20
> > 2009-07-22 18:40:00   90
> > 10 11825938-C 1 07/21/2009 21:30:22 07/21/2009  17:30:22 39.29419
> > -76.54903   2.024886  31.771920   3  31.20   6
> 25.91
> > 2009-07-22 18:50:00  100
> > 11 11825938-C 1 07/21/2009 21:30:28 07/21/2009  17:30:28 39.29430
> > -76.54929   2.831168   4.424288   6  25.91  51
> 25.10
> > 2009-07-22 19:00:00  110
> > 12 11825938-C 1 07/21/2009 21:31:19 07/21/2009  17:31:19 39.29432
> > -76.54958   1.545548  26.467430  51  25.10   3
> 26.97
> > 2009-07-22 19:01:00  120
> > 13 11825938-C 1 07/21/2009 21:31:22 07/21/2009  17:31:22 39.29415
> > -76.54980   2.062366  37.769800   3  26.97   3
> 34.60
> > 2009-07-22 19:05:00  130
> > 14 21727638-C 1 07/21/2009 21:31:25 07/21/2009  17:31:25 39.29388
> > -76.55000   2.120106  45.099640   3  34.60   2
> 25.38
> > 2009-07-21 17:11:00  140
> > 15 21727638-C 1 07/21/2009 21:31:27 07/21/2009  17:31:27 39.29367
> > -76.55014   2.107622  47.953690   2  25.38   2
> 26.39
> > 2009-07-21 17:21:00  150
> > 16 21727638-C 1 07/21/2009 21:31:29 07/21/2009  17:31:29 39.29347
> > -76.55029   1.942083  47.656870   2  26.39   2
> 26.72
> > 2009-07-21 17:31:00  160
> > 17 21727638-C 1 07/21/2009 21:31:31 07/21/2009  17:31:31 39.29326
> > -76.55045   2.023492  46.045600   2  26.72   3
> 36.03
> > 2009-07-21 17:41:00  170
> > 18 21727638-C 1 07/21/2009 21:31:34 07/21/2009  17:31:34 39.29298
> > -76.55066   2.211649  39.604580   3  36.03   3
> 27.07
> > 2009-07-21 17:51:00  180
> > 19 21727638-C 1 07/21/2009 21:31:37 07/21/2009  17:31:37 39.29277
> > -76.55083   1.479468  25.079980  

Re: [R] Fuzzy merge using timestamps

2010-11-10 Thread Sarah Goslee
Hi Ian,

Did you see the second part of my request, for a simple
reproducible example? At the very least, we need your
merge and sort code, since it sounds like that's where
the problem truly lies.

Sarah

On Wed, Nov 10, 2010 at 2:02 PM, Ian Craig  wrote:
> So here is a sample of the datasets.  I have modified them for the purposes
> of circulating to solve this problem.  I have already added the ID and key
> as seen in the code below.  Any ideas?
> gpsdata
>    gpsARC Protocol Track    UTCDate  UTCTime  LocalDate LocalTime        y
>       x Altitude_m Speed_km_h TimeBef_sec Dist_Bef_m TimeAft_sec DistAft_m
>     t_datetimegps ID1 key1
> 1  118259    38-C     1 07/21/2009 21:29:05 07/21/2009  17:29:05 39.29393
> -76.54961   0.547843   0.162998           0       0.00          36     27.25
> 2009-07-21 17:09:00   1    0
> 2  118259    38-C     1 07/21/2009 21:29:41 07/21/2009  17:29:41 39.29397
> -76.54930   6.615943  27.425990          36      27.25           4     32.53
> 2009-07-21 17:19:00   2    0
> 3  118259    38-C     1 07/21/2009 21:29:45 07/21/2009  17:29:45 39.29383
> -76.54897   6.357120  30.192910           4      32.53           3     26.39
> 2009-07-21 17:29:00   3    0
> 4  118259    38-C     1 07/21/2009 21:29:48 07/21/2009  17:29:48 39.29363
> -76.54881   6.134487  30.489670           3      26.39           5     27.27
> 2009-07-21 17:39:00   4    0
> 5  118259    38-C     1 07/21/2009 21:29:53 07/21/2009  17:29:53 39.29341
> -76.54868   6.451584  13.394810           5      27.27          10     27.47
> 2009-07-21 17:49:00   5    0
> 6  118259    38-C     1 07/21/2009 21:30:03 07/21/2009  17:30:03 39.29352
> -76.54839   4.324665  13.323760          10      27.47           5     26.25
> 2009-07-22 18:00:00   6    0
> 7  118259    38-C     1 07/21/2009 21:30:08 07/21/2009  17:30:08 39.29370
> -76.54819   3.662347  12.518040           5      26.25           8     30.68
> 2009-07-22 18:20:00   7    0
> 8  118259    38-C     1 07/21/2009 21:30:16 07/21/2009  17:30:16 39.29394
> -76.54837   2.378734  36.134190           8      30.68           3     31.95
> 2009-07-22 18:30:00   8    0
> 9  118259    38-C     1 07/21/2009 21:30:19 07/21/2009  17:30:19 39.29408
> -76.54870   2.222491  38.962510           3      31.95           3     31.20
> 2009-07-22 18:40:00   9    0
> 10 118259    38-C     1 07/21/2009 21:30:22 07/21/2009  17:30:22 39.29419
> -76.54903   2.024886  31.771920           3      31.20           6     25.91
> 2009-07-22 18:50:00  10    0
> 11 118259    38-C     1 07/21/2009 21:30:28 07/21/2009  17:30:28 39.29430
> -76.54929   2.831168   4.424288           6      25.91          51     25.10
> 2009-07-22 19:00:00  11    0
> 12 118259    38-C     1 07/21/2009 21:31:19 07/21/2009  17:31:19 39.29432
> -76.54958   1.545548  26.467430          51      25.10           3     26.97
> 2009-07-22 19:01:00  12    0
> 13 118259    38-C     1 07/21/2009 21:31:22 07/21/2009  17:31:22 39.29415
> -76.54980   2.062366  37.769800           3      26.97           3     34.60
> 2009-07-22 19:05:00  13    0
> 14 217276    38-C     1 07/21/2009 21:31:25 07/21/2009  17:31:25 39.29388
> -76.55000   2.120106  45.099640           3      34.60           2     25.38
> 2009-07-21 17:11:00  14    0
> 15 217276    38-C     1 07/21/2009 21:31:27 07/21/2009  17:31:27 39.29367
> -76.55014   2.107622  47.953690           2      25.38           2     26.39
> 2009-07-21 17:21:00  15    0
> 16 217276    38-C     1 07/21/2009 21:31:29 07/21/2009  17:31:29 39.29347
> -76.55029   1.942083  47.656870           2      26.39           2     26.72
> 2009-07-21 17:31:00  16    0
> 17 217276    38-C     1 07/21/2009 21:31:31 07/21/2009  17:31:31 39.29326
> -76.55045   2.023492  46.045600           2      26.72           3     36.03
> 2009-07-21 17:41:00  17    0
> 18 217276    38-C     1 07/21/2009 21:31:34 07/21/2009  17:31:34 39.29298
> -76.55066   2.211649  39.604580           3      36.03           3     27.07
> 2009-07-21 17:51:00  18    0
> 19 217276    38-C     1 07/21/2009 21:31:37 07/21/2009  17:31:37 39.29277
> -76.55083   1.479468  25.079980           3      27.07          40     29.64
> 2009-07-22 19:31:00  19    0
> 20 217276    38-C     1 07/21/2009 21:32:17 07/21/2009  17:32:17 39.29256
> -76.55103   2.041623  23.242470          40      29.64           4     26.38
> 2009-07-22 19:32:00  20    0
> 21 217276    38-C     1 07/21/2009 21:32:21 07/21/2009  17:32:21 39.29255
> -76.55134   2.122359  32.137710           4      26.38           3     34.66
> 2009-07-22 19:34:00  21    0
> 22 217276    38-C     1 07/21/2009 21:32:24 07/21/2009  17:32:24 39.29276
> -76.55164   0.776128  48.219300           3      34.66           2     28.80
> 2009-07-22 19:36:00  22    0
> 23 217276    38-C     1 07/21/2009 21:32:26 07/21/2009  17:32:26 39.29294
> -76.55187  -0.542242  53.586880           2      28.80           2     31.85
> 2009-07-22 19:38:00  23    0
> 24 217276    38-C     1 07/21/2009 21:32:28 07/21/2009  17:32:28 39.29316

Re: [R] Fuzzy merge using timestamps

2010-11-10 Thread Ian Craig
So here is a sample of the datasets.  I have modified them for the purposes
of circulating to solve this problem.  I have already added the ID and key
as seen in the code below.  Any ideas?

gpsdata

   gpsARC Protocol TrackUTCDate  UTCTime  LocalDate LocalTimey
  x Altitude_m Speed_km_h TimeBef_sec Dist_Bef_m TimeAft_sec DistAft_m
t_datetimegps ID1 key1
1  11825938-C 1 07/21/2009 21:29:05 07/21/2009  17:29:05 39.29393
-76.54961   0.547843   0.162998   0   0.00  36 27.25
2009-07-21 17:09:00   10
2  11825938-C 1 07/21/2009 21:29:41 07/21/2009  17:29:41 39.29397
-76.54930   6.615943  27.425990  36  27.25   4 32.53
2009-07-21 17:19:00   20
3  11825938-C 1 07/21/2009 21:29:45 07/21/2009  17:29:45 39.29383
-76.54897   6.357120  30.192910   4  32.53   3 26.39
2009-07-21 17:29:00   30
4  11825938-C 1 07/21/2009 21:29:48 07/21/2009  17:29:48 39.29363
-76.54881   6.134487  30.489670   3  26.39   5 27.27
2009-07-21 17:39:00   40
5  11825938-C 1 07/21/2009 21:29:53 07/21/2009  17:29:53 39.29341
-76.54868   6.451584  13.394810   5  27.27  10 27.47
2009-07-21 17:49:00   50
6  11825938-C 1 07/21/2009 21:30:03 07/21/2009  17:30:03 39.29352
-76.54839   4.324665  13.323760  10  27.47   5 26.25
2009-07-22 18:00:00   60
7  11825938-C 1 07/21/2009 21:30:08 07/21/2009  17:30:08 39.29370
-76.54819   3.662347  12.518040   5  26.25   8 30.68
2009-07-22 18:20:00   70
8  11825938-C 1 07/21/2009 21:30:16 07/21/2009  17:30:16 39.29394
-76.54837   2.378734  36.134190   8  30.68   3 31.95
2009-07-22 18:30:00   80
9  11825938-C 1 07/21/2009 21:30:19 07/21/2009  17:30:19 39.29408
-76.54870   2.222491  38.962510   3  31.95   3 31.20
2009-07-22 18:40:00   90
10 11825938-C 1 07/21/2009 21:30:22 07/21/2009  17:30:22 39.29419
-76.54903   2.024886  31.771920   3  31.20   6 25.91
2009-07-22 18:50:00  100
11 11825938-C 1 07/21/2009 21:30:28 07/21/2009  17:30:28 39.29430
-76.54929   2.831168   4.424288   6  25.91  51 25.10
2009-07-22 19:00:00  110
12 11825938-C 1 07/21/2009 21:31:19 07/21/2009  17:31:19 39.29432
-76.54958   1.545548  26.467430  51  25.10   3 26.97
2009-07-22 19:01:00  120
13 11825938-C 1 07/21/2009 21:31:22 07/21/2009  17:31:22 39.29415
-76.54980   2.062366  37.769800   3  26.97   3 34.60
2009-07-22 19:05:00  130
14 21727638-C 1 07/21/2009 21:31:25 07/21/2009  17:31:25 39.29388
-76.55000   2.120106  45.099640   3  34.60   2 25.38
2009-07-21 17:11:00  140
15 21727638-C 1 07/21/2009 21:31:27 07/21/2009  17:31:27 39.29367
-76.55014   2.107622  47.953690   2  25.38   2 26.39
2009-07-21 17:21:00  150
16 21727638-C 1 07/21/2009 21:31:29 07/21/2009  17:31:29 39.29347
-76.55029   1.942083  47.656870   2  26.39   2 26.72
2009-07-21 17:31:00  160
17 21727638-C 1 07/21/2009 21:31:31 07/21/2009  17:31:31 39.29326
-76.55045   2.023492  46.045600   2  26.72   3 36.03
2009-07-21 17:41:00  170
18 21727638-C 1 07/21/2009 21:31:34 07/21/2009  17:31:34 39.29298
-76.55066   2.211649  39.604580   3  36.03   3 27.07
2009-07-21 17:51:00  180
19 21727638-C 1 07/21/2009 21:31:37 07/21/2009  17:31:37 39.29277
-76.55083   1.479468  25.079980   3  27.07  40 29.64
2009-07-22 19:31:00  190
20 21727638-C 1 07/21/2009 21:32:17 07/21/2009  17:32:17 39.29256
-76.55103   2.041623  23.242470  40  29.64   4 26.38
2009-07-22 19:32:00  200
21 21727638-C 1 07/21/2009 21:32:21 07/21/2009  17:32:21 39.29255
-76.55134   2.122359  32.137710   4  26.38   3 34.66
2009-07-22 19:34:00  210
22 21727638-C 1 07/21/2009 21:32:24 07/21/2009  17:32:24 39.29276
-76.55164   0.776128  48.219300   3  34.66   2 28.80
2009-07-22 19:36:00  220
23 21727638-C 1 07/21/2009 21:32:26 07/21/2009  17:32:26 39.29294
-76.55187  -0.542242  53.586880   2  28.80   2 31.85
2009-07-22 19:38:00  230
24 21727638-C 1 07/21/2009 21:32:28 07/21/2009  17:32:28 39.29316
-76.55210  -0.383976  56.380370   2  31.85   2 32.48
2009-07-22 19:40:00  240
25 21727638-C 1 07/21/2009 21:32:30 07/21/2009  17:32:30 39.29344
-76.55223  -1.343364  58.055140   2  32.48   2 33.62
2009-07-22 19:41:00  250
26 21727638-C 1 07/21/2009 21:32:32 07/21/2009  17:32:32 39.29374
-76.55227  -2.229877  61.005420   2 

Re: [R] Fuzzy merge using timestamps

2010-11-10 Thread Sarah Goslee
On Wed, Nov 10, 2010 at 12:57 PM, Ian Craig  wrote:
> Greetings Supreme Council of R Masters,

Nice. :)

> I have two sets of data, each with a set of timestamps.  I would like to
> somehow merge the datasets based on the timestamps and an individual
> identifier.  That is there are several individuals all with timestamps, with
> times that could overlap.  By browsing through some of the older posts, I
> got the idea to create a third data frame of both sets of timestamps,
> individual identifiers, and a key to determine which dataset they have come
> from, then find the breaks to determine which of each dataset should be
> paired.  the code I have written so far look something like this.

This would be easier to sort through if you included a toy example with
data so that we could try it. As it is, I have no idea what your data
actually look like.

> gpsdata$t_datetimegps<-as.POSIXct(gpsdata$t_datetimegps)
> urdata$t_datetimeur<-as.POSIXct(urdata$t_datetimeur)
>
> gpsdata$ID1 <- row.names(gpsdata)
> urdata$ID2 <- row.names(urdata)
>
> gpsdata$key1 <- rep(0, nrow(gpsdata))
> urdata$key2 <- rep(1, nrow(urdata))
>
> checkTimes <- data.frame(ID=c(gpsdata$ID1, urdata$ID2),
>        ARC=c(gpsdata$gpsARC, urdata$urARC),
>        times=c(gpsdata$t_datetimegps, urdata$t_datetimeur),
>        key=c(gpsdata$key1, urdata$key2))
>
> checkTime <- checkTimes[order(checkTimes$ARC,checkTimes$times, decreasing =
> FALSE),]
>
> breaks <- which(diff(checkTime$key) == 1)
>
> match <- data.frame(ID1=checkTime$ID[breaks],
>        gpsARC = checkTime$ARC[breaks],
>        urARC = checkTime$ARC[breaks + 1],
>        t_datetimegps=checkTime$times[breaks],
>        t_datetimeur=checkTime$times[breaks + 1])
>
> #Then I merge the 'match' data frame with the gpsdata data frame and the
> product with the urdata data frame.  The problem is that when I create the
> checkTime data frame and sort it, it sorts the urdata portion first then the
> gpsdata portion.   So my key column looks like
> 1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, instead of
> 0,0,0,1,0,0,1,0,0,0,0,0,0,1, etc. even though I am not sorting on key.
>  S.O.S  Why is it doing this?  Shouldn't it just order the timestamps of
> both data frames together?

So really this is a sorting problem, not a merging problem? Is the merging
part working correctly?

What exactly are you doing to merge? To sort?

Here again a worked functional example would be really useful. Without
knowing what you're doing, I can't offer suggestions.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.