> - Did you implement all of the described HMM break conditions (route > localization, low probability routes, GPS outliers)? After reading the > code in OSRM, I was only able to find the "low probability routes" > condition. Did I overlook something?
The localization is implemented by choosing the candidates before we start the algorithm. For each input point we adaptively chose between 5 and 10 candidates based on the distance to the previous input point. That part of the algorithm can be found in "plugins/match.hpp". The outliers test is not implemented, I'm not sure it would add much value over the limited search radius for candidates combined with the pruning based on transition probability. > > - As far as I understand, MAX_DISTANCE_DELTA corresponds to the delta > when comparing the route length and great circle distance for the "low > probability routes" condition. The paper states a delta of 2000m, the > implementation uses a delta of 200m. Feature or bug? > I found that 2000m is a little bit on the conservative side. At least for my data 200m worked pretty well (sampling period was approximately 7s). Please not that most parameters are tuned for sampling periods of around 5 to 10 seconds. > - What exactly does the "confidence" return value mean? > Since we are dealing with real world data, matching will fail for some traces. That might be cause the trace is too noisy or the data from OpenStreetMap has problems like connectivity errors. To get a handle on that I gathered some empirical data on mismatched traces and tried to find a good feature to classify matchings are valid or invalid. The feature that worked best for me was the ratio between trace length and matching length (the intuition here is that invalid matchings tend to contain "loops" where detours are taken). I used that labeled data to fit a Laplacian distribution and constructed a naive Bayes classifier based on that. The "confidence" is the probability P(x \in valid). The values are only based on ~800 labeled traces which specific sampling rate, so take that value with a grain of salt for your data. What is missing is a good parameter selection based on the sample rate of the input. Its not clear when I will have time again to do that (for now massaging the data to fit the current constraints works quite well). _______________________________________________ OSRM-talk mailing list [email protected] https://lists.openstreetmap.org/listinfo/osrm-talk
