David Earl wrote:
>Sent: 01 December 2008 3:10 PM
>To: [email protected]
>Subject: Re: [Talk-GB] Request for UK address lists for postcode extraction
>
>On 01/12/2008 14:11, Brian Quinion wrote:
>> Hi,
>>
>> I'm currently doing some work trying to generate postcode location
>> data for the UK using address lists and address lookup using OSM data
>> to supplement NPE.  So far it seems to work quite well with the
>> address lists that I have available to me (and coping quite well with
>> ambiguous road names) but I'm limited in my data sources and most of
>> the address data is fairly consistent in both format and quality.
>>
>> So, before I open the interface to the public, I'd like to test the
>> code with some lists provided by other people.
>>
>> Does anyone have, or know of, any address lists that I would be able
>> to use for this purpose?  Obviously it needs to be license compatible
>> with OSM (so please no lists generated from royal mail postcode data!)
>> and ideally I'm after data sets containing at least:
>>
>> street address (house name / number optional)
>> town / city
>> postcode
>>
>> formatted as CSV or TSV.  I'm specifically not after data containing
>> the names of individuals.
>>
>> Has anyone got any suggestions, or is willing to offer any data?  Even
>> personal address books would be useful for testing...
>
>Why not do it the other way round?
>
>You know all the 2,500 or so prefixes, and there are only 26 x 26 * 100
>combinations for the second part for each - about 200 million in all. If
>you feed these potential postcodes in quotes into Google UK over a long
>period with appropriate pauses so as not to get locked out, and look at
>the result for recognizable addresses (that's the tricky bit) as I'm
>doing in the Namefinder, you'd probably cover 75% of UK postcodes.
>
>Yes, its slow, but it's probably the biggest source there is. At one a
>second it would take about 6 years, but by enlisting 100 friends you'd
>do it in a month - less if it's possible to be more intelligent about it
>- for example, for the number part if there's no 14XX or 15XX I doubt
>there would be any 16s or above either, except for a few special cases.

I'm curious about this. Data scraped via Google is still subject to the
terms of the original page it references?

Cheers

Andy

>
>David
>
>
>_______________________________________________
>Talk-GB mailing list
>[email protected]
>http://lists.openstreetmap.org/listinfo/talk-gb
>
>No virus found in this incoming message.
>Checked by AVG - http://www.avg.com
>Version: 8.0.176 / Virus Database: 270.9.12/1821 - Release Date: 30/11/2008
>5:53 PM


_______________________________________________
Talk-GB mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/talk-gb

Reply via email to