The examples are contrived because we're not testing.  We're pointing out
why this is a bad idea.  Using real world examples would just encourage
people to fix those examples and ignore the fact that the process is wrong.

Anyway, you realize that the road type doesn't always appear after the base
name, right?

---------- Forwarded message ----------
From: *Serge Wroclawski*
Date: Friday, May 11, 2012
Subject: [Talk-us] Fixing TIGER street name abbreviations
To: Dale Puch <dale.p...@gmail.com>
Cc: talk-us@openstreetmap.org


On Fri, May 11, 2012 at 4:17 PM, Dale Puch <dale.p...@gmail.com> wrote:
> I understand the script checks for only one instance of the abbreviation.

> My point was what is someone manually expanded ONE of the abbreviations,
> leaving "st something street"?  Is that checked for?

I have a number of thoughts here:

1.  Real world examples.

Many of the examples I've seen are contrived. I'm glad we're testing,
but testing needs to be based on actual data seen in the US dataset.

That said:

2. There are a couple of ways to handle this:

* One way (the most conservative way) would be to test for untouched
TIGER ways. That is ways in which they're still at version 1. This
would be a real problem, though, since there are lots of examples were
someone may have fixed the geometry without touching the tags.

* The other way is a method I'm using in an experimental branch of the
code on my machine, which is to try to be a bit more selective about
the expansions of road types. If we assume that the road type always
appears after the base name, we can be handle examples like (real
world example) "St Marys St". The same would hold true for direction
tags, so we'd be able to expand "E E St" confidently as well.

But there's a catch. If someone would have edited the name of the
above street from the original "St Marys St" to "St. Marys St" then
that test would fail, and the expansion would never occur, where as in
the current version, it would.

So:

3. Any method used is going to produce some number of potential either
false positives or false negatives. I contend that the number of
errors in either case will be so tiny that it will be lost in the
noise, but there's no way to promise it will always be 0. The best we
can do is toss out uncertain expansions and have them handled manually
(which is something I'm working to make better in the next version of
the code as well).

But:

4. I don't want us to rely on cleverness. I'd much rather rely on
people testing the code with real world inputs and checking the
outputs.


I should have a new version of the code either tonight or tomorrow,
with the new expansion rules.

- Serge

_______________________________________________
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us
_______________________________________________
Talk-us mailing list
Talk-us@openstreetmap.org
http://lists.openstreetmap.org/listinfo/talk-us

Reply via email to