I have been wondering about postcodes. We have a postal_code tag which can
be applied to streets and it would be nice to collect these. However it is
not something like name plates that you find in the street by observation.
There are about 2 million postcodes in the UK, so gathering them manually
via the freethepostcode project is hard. But using data from most other
places is subject to copyright.
But what about this as a 90% solution:
(a) generate a list of potential postcodes (there are just under 50 million
patterns of the form "[A-Z]{1,2}[0-9]{1,2} [0-9][A-Z]{2}". Unlike countries
with numerical codes, these are quite distinctive and amenable to pattern
matching.
(b) enumerate streets in the UK from OSM(1) and determine what place they
are "near" in the UK(2) (e.g. gives us "Hinton Road, Fulbourn" among many
others)
(c) do an automated web search on the street. The hits will nearly always(3)
contain a result which includes one or more addresses in the summary (no
need to go further). Do a pattern match which is restrictive enough to
determine the postcode(s) for the address in the sought street, but general
enough to cope with some variability (punctuation, skipping suburbs and
counties in the address and so on)
(d) look up the pattern-matched code from (c) in our table from (a) and fill
in against it the lat/lon derived from (b). Take the postcode from our table
and apply it to the postal_code tag in OSM for the street.
The important point here is that we haven't copied anyone's data in doing
this, so we haven't infringed anyone's copyright. All the data we record has
come from our own hands (starting with table from (a) means we aren't
copying the postcode from the search results, and the lat/lon comes from the
OSM data). We've merely used the coincidence of juxtaposition in search
results as a heuristic. It can't be a derivative work because we haven't
modified, "recast, transformed, or adapted" any original work (other than
our own). (One could argue that search results _are_ derivative works of the
original web pages and that the G company has a problem, but that's not our
problem)
There is a danger that repeated automatic web searches might be autodetected
by the G company or whoever and one's IP blocked as a result. However it
could be done over a longish period of time (a month, say, so the search
rate is low and spread across many volunteers. And as new streets appear
they can be added incrementally - a much lower search volume.
(1) There's about 65,000 distinct named streets currently in the UK in OSM
(I can isolate these in the name finder database using category highway and
lat/lon conditions - 10 volunteer machines could cover this in a month at no
more than one search every 5 minutes)
(2) For each street the name finder can determine the nearest place (in the
case of suburbs the nearest town to the nearest suburb); this may cause a
few boundary conditions because "nearest" and "in" isn't the same thing a
road on the edge of a large village for example may be nearer to the
neighbouring village centre than its own so would be wrongly attributed; the
suburb/town combination gets rid of most edge cases, but if there's a doubt,
we can always try both combinations.
(3) Try it - I randomly chose some streets from around the country in run
down suburbs, small villages, city centres, edges of towns from the OSM map,
and got hits for every single one (use the form "Hinton Road, Fulbourn"
including the quotes). Doesn't necessarily mean I got all the postcodes for
a street, especially a particularly long one. Pretty much every business has
their full address online somewhere even if only in a directory, and there's
lots of things like small business people working from home in these, and
planning applications listed online, house price survey sites and so on.
There is some systematic ordering to postcodes, which may mean it is
possible to do some further analysis on longer streets and neighbours to
further refine the data.
This wouldn't be as accurate as commercial geocoders. OTOH it is no less
accurate than searching by street name in the name finder.
David
_______________________________________________
Talk-GB mailing list
[email protected]
http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/talk-gb