Stephen,

You read my mind. I was preparing those notes in .HTML format and
wanted to share 'em.

I posted them to
http://postgis.refractions.net/support/wiki/index.php?InstallingTigerGeocoder
and will eventually look back around and convert the markup from HTML
to the wiki format.

If anybody wants to they're more than welcome to. I just don't have
the balance of the time for the day, is all.

Similarly, I've attached the HTML version to this email.

Any changes / upgrades / corrections (which I'm sure are all in some
capacity really merited!) are appreciated.

Thanks,
Josh

On Sun, Jul 13, 2008 at 4:36 PM, Stephen Woodbridge
<[EMAIL PROTECTED]> wrote:
> Not sure I can help more then to offer you encouragement at this point. It
> seems like you are very close at this point. I guess I need to get 8.3
> loaded on some system here and test out some of my code and see how much of
> that blows up.
>
> It would be good if you would update the wiki with how you loaded the data
> and what you had to do. This question comes up frequently on the list so it
> would be good to be able to point people to a document on how to do it.
>
> -Steve
>
> Josh Long wrote:
>>
>> Steve,
>>
>> Thanks again for your help.  Just about 5 hours ago I managed to
>> finish the successul installation of ... well.. everything.. on
>> Postgres 8.3. It took 2 days and near 100gbs, but its done..
>>
>> I ran my first test this morning with select geocode(  'address blah
>> blah') ;
>>
>> waited eagerly
>>
>> and then it failed...
>>
>> complained about all sorts of casts (I guess this hasn't been tested
>> or run on 8.3, which is fairly new..) from types/against types to
>> text.. .
>>
>> I will try and take a look at the hundreds and hundreds of SQL for the
>> functions, but I'm not sure I'll have a prayer...
>>
>> May have to revisit this at another time. Has anyone gotten this to
>> work on 8.3?
>>
>> Thanks in advance,
>> Josh
>>
>> On Sat, Jul 12, 2008 at 5:07 PM, Stephen Woodbridge
>> <[EMAIL PROTECTED]> wrote:
>>>
>>> Josh,
>>>
>>> I have used the tiger geocoder so I'm only guessing ...
>>>
>>> You only need the county and place files
>>> download these and unzip them.
>>> use shp2pgsql to load them, it has an option to specify the table name.
>>> If there are more than one *.shp for either of these the use -c to create
>>> the table on the first one and then use -a to append to the table for all
>>> the rest in the set.
>>>
>>> -Steve
>>>
>>> Josh Long wrote:
>>>>
>>>> Thanks for the quick reply.
>>>>
>>>> I'm looking at the files
>>>> (http://www.census.gov/geo/www/cob/bdy_files.html). Do I download them
>>>> all? I'm just trying to get the Tiger geocoder to work... That is, I
>>>> only see references to the places and counties lookup tables in the
>>>> lookup_tables.sql -- no mention of any of these other datasets, per
>>>> se.. Which boundry files correspond to those requirements? What's the
>>>> installation procedure, I guess, is what I'm looking for.
>>>>
>>>> How do I ensure those tables get created? I'm looking at the
>>>> cartographic boundry files
>>>> (http://www.census.gov/geo/www/cob/bdy_files.html).. Do I need te
>>>> congressional districts and so on?
>>>>
>>>> If I do manage to figure this all out (your help being very
>>>> appreciated) I'll share a detailed tutorial walking from a fresh
>>>> Postgres to the finish line..
>>>>
>>>> Thanks,
>>>> Josh
>>>>
>>>>
>>>> On Sat, Jul 12, 2008 at 3:43 PM, Stephen Woodbridge
>>>> <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>> starbuxman wrote:
>>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> I'm trying to setup the TIGER geocoder but am new at this sort of
>>>>>> thing:
>>>>>>
>>>>>> I've already finished the bit where i run import/load_tiger.sh
>>>>>> successfully.
>>>>>>
>>>>>> I'm using the 2006se TIGER line data. I now have all the PostGIS (on
>>>>>> Postgres 8.3) and two tables Landmarks and CompleteChain (with a
>>>>>> whopping 66 million records!/ 25gb footprint!) that took 28 hours to
>>>>>> install.
>>>>>>
>>>>>> Now it says to import the .shp files... Only question is, what shp
>>>>>> files?
>>>>>>
>>>>>> Here is the listing of the typical archive:
>>>>>> Archive:  TGR02164.ZIP
>>>>>>  inflating: TGR02164.RT1
>>>>>>  inflating: TGR02164.RT2
>>>>>>  inflating: TGR02164.RT4
>>>>>>  inflating: TGR02164.RT5
>>>>>>  inflating: TGR02164.RT6
>>>>>>  inflating: TGR02164.RT7
>>>>>>  inflating: TGR02164.RT8
>>>>>>  inflating: TGR02164.RTA
>>>>>>  inflating: TGR02164.RTC
>>>>>>  inflating: TGR02164.RTE
>>>>>>  inflating: TGR02164.RTH
>>>>>>  inflating: TGR02164.RTI
>>>>>>  inflating: TGR02164.RTM
>>>>>>  inflating: TGR02164.RTP
>>>>>>  inflating: TGR02164.RTR
>>>>>>  inflating: TGR02164.RTS
>>>>>>  inflating: TGR02164.RTT
>>>>>>  inflating: TGR02164.RTZ
>>>>>>  inflating: TGR02164.MET
>>>>>>
>>>>>> Then, finally, looking ahead: I opened up the enxt step which is
>>>>>> lookup_tables.sql and it in turn tries to create tables while drawing
>>>>>> from tables called
>>>>>>  co99_d00  and pl99_d00  -- where do these tables get created?  How?
>>>>>
>>>>> These the the shapefiles that you need to load. You can download them
>>>>> from
>>>>> Census site look for the cartographic boundary files.
>>>>>
>>>>>> I know that the 2007 data is now using .shp files (or at least that's
>>>>>> my understanding) -- do I need to download those, as well? Does that
>>>>>> introduce the risk of data that's not out of sync? For that matter,
>>>>>> what do I do with that data in the first place?
>>>>>
>>>>> No!, these are a totaly different format and will take a new process to
>>>>> install them into the geocoder if it is even possible.
>>>>>
>>>>> -Steve W
>>>>>
>>>>>> Any help is greatly appreciated, thank you,
>>>>>>
>>>>>> Joshua Long
>>>>>> http://www.joshlong.com/
>>>>>
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> [email protected]
>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>
>>>>
>>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> [email protected]
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>
>>
>>
>
> _______________________________________________
> postgis-users mailing list
> [email protected]
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>



-- 
Joshua Long
Sun Certified Java Programmer
http://www.joshlong.com/

Installng PostGIS and the TIGER Geocoder

Note this is an incomplete document. I have yet to confirm that this works 100%, but here are the things I noted while installing PostGIS' TIGER Geocoder. First and foremost, the source code is available via svn co http://svn.refractions.net/postgis/trunk/extras/tiger_geocoder. A lot of this information is an incomplete copy of what's available from : http://postgis.refractions.net/support/wiki/index.php?PostgisOnUbuntu

What follows are the instructions as present in the source code. I'll interpolate the bits that were a little unclear and required assistance for me so you can benefit from them.

  1. Install postgresql-8.2-postgis (For various GIS bits). For Ubuntu Hardy users, this was a mere sudo apt-get install postgresql-8.3 postgresql-contrib-8.3 postgresql-8.3-postgis. Invariably, you'll need some extra tool support so: sudo apt-get install gdal-bin python2.5-gdal (Obviously, you should have Python installed, too.). Interestingly, my installation has thus far been a failure since I chose to use Postgres 8.3. If you're on Gutsy or prior, you may have better luck (in fact, I'm almost 100% sure you will) with Postgres 8.2, against which the geocoder code seems to have been tested. I can't guaranteee these instructions are the same, however. They should be.. I can't imagine any of these dependencies not existing in the version of the distro that's just 6 months old :-), but the versions may be different. Anyway: take away is that you should install 8.2 but be cautious. The rest of these instructions are referencing my 8.3 installation, so please change them as necessary.
  2. Install PostGIS functions and spatial ref definitions. This enables PostGIS itself on your database.
    \i /usr/share/postgresql-8.3-postgis/lwpostgis.sql \i /usr/share/postgresql-8.3-postgis/spatial_ref_sys.sql
  3. Install postgresql-contrib-8.3 for soundex (from fuzzystrmatch) This looks something like: psql -U postgres your_database_name < /usr/share/postgresql/8.3/contrib/fuzzystrmatch.sql . There's all sorts of worthy things in the contrib modules, so take your time and do some research and install anything you think useful.
  4. Obtain the TIGER data from the Census site. I used wget -r http://www2.census.gov/geo/tiger/tiger2006se
  5. You need to modify, and then run, import/load_tiger.sh. . What this will do is load all the TIGER data from http://www2.census.gov/geo/tiger/tiger2006se/ into the database in a sort of raw unfettered way so that the processing to setup the Geocoder works. Here are the modifications I made before running it:
    #!/bin/bash
    
    FIRST=""
    for file in /home/jlong/Desktop/gis/www2.census.gov/geo/tiger/tiger2006se/*/*.ZIP; do
      ./tigerimport.sh append MYDATABASE $file $FIRST
      if [ -z "$FIRST" ]; then
        FIRST="-a"
      fi
    done
    You see? All I change was the value for MYDATABASE and the value for pointing tot he directory where I had downloaded all the data fromthe census site. Also, I modified tigerimport.sh itself. Whenever I saw lines starting with ogr2ogr, I modified the connection paramaters as appropriate: ogr2ogr $PRE $OGROPTS $SRS "PG:dbname=$DATABASE user=MYUSEr password=MYPW host=127.0.0.1" $TIGER CompleteChain $POST. For my setup, I needed to specify the host. Which I guessed was done using a 'host=127.0.0.1' parameter. That seemed to work. Simialarly, before you start importing everything, you may run into encoding issues down the line after this step. You can 'undo' it afterwards. This was necessary as the client's talking to my database were trying to send non UTF8 data to my database which was UTF8. ALTER DATABASE lemanger SET client_encoding=latin1; . Alright, finally, run the script above once you're confident everything's changed the way it needs to be! Warning On my personal workstation with a 3GHz Pentium 4 (Not dual core, but still, no slouch) and 2 GB of RAM, this process took 28 hours and 30GBs or so of space. This is not a 'grab a cup of coffee' situation. This is, 'go visit the family ... in the next state... and stop all other concurrently running programs on the workstation for a day' situation. In total -- by the way -- so you don't get in over your head, I needed upwards of 100 GB of free space to complete this process. Some of that space is eventually recoverable, because it's temporary. But still, if you want to preflight this process with just one state, that may be advisable if you have a small hard disk. Another thing to note by the way is that a 'State', in this case, is a state of the United States and includes in its definition 59 entities that aren't from the 50 proper US 'States', including Puerto Rico, for example. Very cool.
  6. Here's where I'm only a little sure of what to do and I could -- in fact -- be wrong. I did get pointed in the right direction (thank you Stephen Woodbridge!), so hopefully this is correct. The next step consists of creating tables that the geocoder will need from another set of data, the US Census Shape Boundry files. They are are at http://www.census.gov/geo/www/cob/bdy_files.html. You don't need them all, just a few. They organized according to a code, as described here: http://www.census.gov/geo/www/cob/filenames.html. Thus, each file is a data set, and some correspond to granularities smaller than 'national'. When you see some of the tables that you're expected to have installed, it's looking for 'national' data sets, and I couldn't find them for all of them. It needs tables of the following type: Mouse over the links to verify that data you're looking at has the code as the data you want. Since 99 means national, and 00 means year 2000, and the part in the begining is the abbreviation, then you know that cs99_d00 is probably county subdivisions from the year 2000 for the national level, which is http://www.census.gov/geo/www/cob/cs2000.html. Notice that there's no 99. And that there's no single set of data! in this case I think we're after just the .SHP files. For me, it was easier to download those listing pages to my local disk somewehre and then extract the URLs and then download them from that list. Here's I did that.
    #!/bin/bash
    cat cs_listing.html |grep -i shp |grep geo|cut -f2 -d\"| while read url; 
     do 
     nurl="http://www.census.gov$url" 
     echo $nurl; 
    done > base_of_zip_files.txt 
    
    That got me a listing of the zip files that I wanted. The next trick was to download those zip files:
    #!/bin/bash
    cat base_of_zip_files.txt   | while read url; 
     do 
     nurl="http://www.census.gov$url" 
     echo $nurl; 
     wget $nurl 
    done 
    
    That will take a few minutes. I couldn't get the data all in one for anything but the counties dataset. So, I had to download all the data and forge from them one aggregate data. It helps to be meticulous and create a 'boundry' folder inside of which you create separate 'cs', 'co', and 'pl' folders for these datasets.
  7. Once you've downloaded all the datasets, you'll need to install them. Here's my slightly modified version of the script from the installation instructions that come witht he Tiger geocoder. I created one for each dataset and ran it in the same folder as the particular dataset itself:
    #!/bin/bash
     find . -iname "*ZIP"|while read l;
     do
     mkdir -p x; 
     rm -rf x ;
     mkdir x ;
     cd x; 
      unzip .$l; 
    BASE=`echo *.shp | cut -f1 -d.` ;
    ogr2ogr -append -f PostgreSQL -a_srs EPSG:4269 "PG:dbname=DATABASE host=127.0.0.1 user=USER password=PASSWORD" $BASE.shp $BASE -nln TABLE_NAME -lco DIM=2
     cd ..; 
    
    done   
    
    Basically, TABLE_NAME is the name of the table (one of the three I mentioned above, dependant on which dataset you're dealing with.) and the rest is just the information needed to get into your database. By the way, if you didn't do the client encoding step I mentioned above, this is where it may bite you in the butt.
  8. Create geometry indexes on all the tables using GiST, eg:
      CREATE INDEX completechain_geom_idx ON completechain USING GIST (wkb_geometry public.gist_geometry_ops);
     
    Naturally, since this is operating on the 66 million record behemoth you created in the beginning, this will take many hours. Go watch a movie... or two.
    
  9. NB: the following steps are pretty straight forward, if a bit tedious. In particular, one of them is REALLY slow and another one is going to break if you're trying this on 8.3 due to cast errors, complaining of comparisons of integers to char/text/varchar data types. The errors are pretty self explanatory and will show you what needs to be fixed. Typically it's just a matter of changing the _expression_ in question to what it was, plus "::text" to incur a cast.
  10. Create lookup tables using tables/lookup_tables.sql.
  11. Create roads_local using roads_local.sql . I think this one is the one that does a derivation from a dataset produced by a join on one of the bigger tables, and so it too takes an incredibly long time. 5 hours, or so.
  12. Create 'tiger_geocode_roads' using tiger_geocode_roads.sql
  13. Run create_geocode.sql to load the functions
  14. Profit!
  15. _______________________________________________
    postgis-users mailing list
    [email protected]
    http://postgis.refractions.net/mailman/listinfo/postgis-users
    

Reply via email to