David Earl wrote: >Sent: 01 December 2008 3:10 PM >To: [email protected] >Subject: Re: [Talk-GB] Request for UK address lists for postcode extraction > >On 01/12/2008 14:11, Brian Quinion wrote: >> Hi, >> >> I'm currently doing some work trying to generate postcode location >> data for the UK using address lists and address lookup using OSM data >> to supplement NPE. So far it seems to work quite well with the >> address lists that I have available to me (and coping quite well with >> ambiguous road names) but I'm limited in my data sources and most of >> the address data is fairly consistent in both format and quality. >> >> So, before I open the interface to the public, I'd like to test the >> code with some lists provided by other people. >> >> Does anyone have, or know of, any address lists that I would be able >> to use for this purpose? Obviously it needs to be license compatible >> with OSM (so please no lists generated from royal mail postcode data!) >> and ideally I'm after data sets containing at least: >> >> street address (house name / number optional) >> town / city >> postcode >> >> formatted as CSV or TSV. I'm specifically not after data containing >> the names of individuals. >> >> Has anyone got any suggestions, or is willing to offer any data? Even >> personal address books would be useful for testing... > >Why not do it the other way round? > >You know all the 2,500 or so prefixes, and there are only 26 x 26 * 100 >combinations for the second part for each - about 200 million in all. If >you feed these potential postcodes in quotes into Google UK over a long >period with appropriate pauses so as not to get locked out, and look at >the result for recognizable addresses (that's the tricky bit) as I'm >doing in the Namefinder, you'd probably cover 75% of UK postcodes. > >Yes, its slow, but it's probably the biggest source there is. At one a >second it would take about 6 years, but by enlisting 100 friends you'd >do it in a month - less if it's possible to be more intelligent about it >- for example, for the number part if there's no 14XX or 15XX I doubt >there would be any 16s or above either, except for a few special cases.
I'm curious about this. Data scraped via Google is still subject to the terms of the original page it references? Cheers Andy > >David > > >_______________________________________________ >Talk-GB mailing list >[email protected] >http://lists.openstreetmap.org/listinfo/talk-gb > >No virus found in this incoming message. >Checked by AVG - http://www.avg.com >Version: 8.0.176 / Virus Database: 270.9.12/1821 - Release Date: 30/11/2008 >5:53 PM _______________________________________________ Talk-GB mailing list [email protected] http://lists.openstreetmap.org/listinfo/talk-gb

