Mark Vantzelfde wrote:
Can anyone comment on the accuracy of the Tiger geocoder vs MapMarker?

I can't comment specifically on these, but in general I have done a lot of geocoder analysis and I have found that the best way seems to be to get a large list of addresses that are typical for your needs. The typical for your needs, is the important part because address lists can be notoriously bad in my experience. So having typical addresses of what you will be feeding it are important to get reasonable measurements. Large is in 20K range for quick checks and 100-250K records for a more through test.

Then run this list to make a baseline and verify the results, either via random samples, or whatever makes sense for your needs. Now you can run this list against other geocoders and get stats.

One way to do this is to have each record already geocoded like:

address, city, state, postalcode, lat, long

When you use this record as input you can compare the distance of the result to this location and use that as a measure of accuracy. Then just compute some statistics on the results to access if the new geocoder is better or worse. You can also slice and dice the results by state or region, because I have seen that some areas have significantly more problems than other area depending on the data set being used.

It is just as important to look at the failures as the successful geocodes and to understand if a success is really that and why a given address failed, bad input, bad data, algorithm limitation or bug, etc.

-Steve

Thanks
Mark

On Tue, Mar 2, 2010 at 11:40 AM, Stephen Woodbridge <wood...@swoodbridge.com <mailto:wood...@swoodbridge.com>> wrote:

    Hi Kevin,

    I have worked with the Tiger data for about 10 years now. The recent
    improvements in tiger are really great to see, but not without their
    own set of issues. Tiger has a lot of known limitations based on the
    rules, regs and requirements of the US Census. The recent work has
    georectified the street data and added lots of new streets based on
    digitizing high-res satellite imagery. but that does not let you
    read the street names so they are added after the fact. There are a
    lot of street segments that do not have names. We can only hope that
    these will be added over time. Because of non-disclosure, address
    ranges can be weird also. Many small streets have address ranges
    1-100 encoded on them, in spite of the fact that the real address
    ranges only run from 1-20. This has the effect of skewing all the
    locations to the front end of the street.

    Because language is ambiguous and typos and sounds-like errors,
    fuzzy searching is employed. Most geocoders do some form of fuzzy
    searching so you often run into the Main St vs Main Ln issue or you
    find W Main St when you are search for E Main St.

    When a geocoder says "Found it!", you need to be prepared to say
    Found What? or be tolerant to mis-geocodes. I like geocoders the
    score the results and return them in ranked order.

    In general a geocoder can never be better than its data and can in
    fact be much worse than its data. Fuzzy searching lets you find
    possible candidates in the data that might not have been encoded
    correctly in either the input address or the data address, but with
    the uncertainty that this is the actual location wanted or not.

    You might also want to look at PAGC Geocoder. It is written in C and
    uses some statistical matching techniques which are very good, There
    are some change in one of the branches that let you load all the
    Tiger data for the US.

    http://www.pagcgeo.org/


    -Steve


    Kevin Galligan wrote:

        I actually bought an early access copy of the book.  I work in
        linux and have been playing around with different geocoders and
        the tiger files.  Most recently with a ruby geocoder, for no
        other reason than I'm trying to find one that is fairly complete
        and functional.

        Any idea how "production quality" this particular one is?  If
        its fairly high, I'll probably put some time in to get it
        working on linux.  I have the full 2009 tiger dataset on an EC2
        block drive, waiting to import into a different database.

        Right now I'm using zip+4 data to get a rough geocode, which is
        good enough for what we're doing, but it only gets 92% of our
        non-PO Box data.  From my experience with the tiger data, it
        only adds a couple percent at most above that, but the geocoders
        I've used have been pretty hacky, so its possible that was the
        issue.  Also, some of them seem to not be concerned with stuff
        like matching "Main St" when you're looking for "Main Ln", which
        is pretty terrible.

        On the plus side, if there is major work going on with this
        geocoder (or any tiger geocoder), I have a huge national data
        volume that will help stress test the system.

        Recently I've been toying with USC's free geocoder project.  In
        some areas it actually gets about half of the data I previously
        could not, which is impressive.

        The really frustrating thing is, in general, the first 90% is
        cheap/free.  The next 3-4% is marginally expensive.  The rest is
        really pricey.

        Is there any idea how complete the tiger data is, and why there
        is this apparent lack of data in there?  I find it strange.
         Some streets are just missing.  Stuff like that.

        Rambling.  Anyway, will take a look later.  Thoughts on the
        quality of the geocoder appreciated.

        -Kevin

        On Fri, Feb 26, 2010 at 11:52 PM, Paragon Corporation
        <l...@pcorp.us <mailto:l...@pcorp.us> <mailto:l...@pcorp.us
        <mailto:l...@pcorp.us>>> wrote:

           David,

           As a matter of fact we've been working on that for chapter 10
        of our
           upcoming book and think we have it all working.  As a part of the
           example
           generation process for our chapter 10, we had to come up with
        a way
           to load
           the tables that works on both windows and Linux.
         Unfortunately we
           haven't
           had a chance to test the Linux loading approach, but is
        pretty much a
           parallel of the windows approach.

           To do so we started out with Steve's code, added some additional
           skeleton
           tables and a database function that generates a command line
        script
           for the
           respective OS.  Hopefully it all makes sense from the readme
        file we
           have
           packaged.

           We also changed one of the functions because there was an
        error in
           it and
           revised slightly to work with Tiger 2009 data.  You can
        dowload our
           slightly
           hacked version of Steve's code from our chapter 10 page.

           Steve -- if you are listening we are hoping to remerge your
        version
           with our
           loader part and bring back into the PostGIS distribution as
        part of
           PostGIS
           1.5.1 or 2.0 release.

           http://www.postgis.us/chapter_10


           Leo and Regina
           http://www.postgis.us/


           -----Original Message-----
           From: postgis-users-boun...@postgis.refractions.net
        <mailto:postgis-users-boun...@postgis.refractions.net>
           <mailto:postgis-users-boun...@postgis.refractions.net
        <mailto:postgis-users-boun...@postgis.refractions.net>>
           [mailto:postgis-users-boun...@postgis.refractions.net
        <mailto:postgis-users-boun...@postgis.refractions.net>
           <mailto:postgis-users-boun...@postgis.refractions.net
        <mailto:postgis-users-boun...@postgis.refractions.net>>] On
        Behalf Of
           Dave
           Fuhry
           Sent: Friday, February 26, 2010 3:04 PM
           To: PostGIS Users Discussion
           Subject: [postgis-users] TIGER geocoder with Census 2009
        shapefiles

           I'm trying to set up the TIGER geocoder from
           http://www.snowman.net/git/tiger_geocoder/ which is new and
        aims to work
           with the new TIGER shapefiles.  I'm trying with the 2009
        shapefiles from
           www2.census.gov/geo/tiger/TIGER2009/
        <http://www2.census.gov/geo/tiger/TIGER2009/>
           <http://www2.census.gov/geo/tiger/TIGER2009/>.


           I'm not sure how to create the roads_local table (derived
        closely from
           completechain in the old version).  A join between edges and
        addr?

           Wondering if anyone can offer any direction.  A relevant
        ticket is
           http://trac.osgeo.org/postgis/ticket/135.  The out-of-date file
           which used
           to create the roads_local table is tables/roads_local.sql, in
        the above
           repository.

           -Dave

                                                 Table "tiger.edges"
Column | Type | Modifiers ------------+------------------------+----------------------------------
           ------------+------------------------+--------------------------
            gid        | integer                | not null default
           nextval('public.edges_gid_seq'::regclass)
            statefp    | character varying(2)   |
            countyfp   | character varying(3)   |
            tlid       | bigint                 |
            tfidl      | bigint                 |
            tfidr      | bigint                 |
            mtfcc      | character varying(5)   |
            fullname   | character varying(100) |
            smid       | character varying(22)  |
            lfromadd   | character varying(12)  |
            ltoadd     | character varying(12)  |
            rfromadd   | character varying(12)  |
            rtoadd     | character varying(12)  |
            zipl       | character varying(5)   |
            zipr       | character varying(5)   |
            featcat    | character varying(1)   |
            hydroflg   | character varying(1)   |
            railflg    | character varying(1)   |
            roadflg    | character varying(1)   |
            olfflg     | character varying(1)   |
            passflg    | character varying(1)   |
            divroad    | character varying(1)   |
            exttyp     | character varying(1)   |
            ttyp       | character varying(1)   |
            deckedroad | character varying(1)   |
            artpath    | character varying(1)   |
            persist    | character varying(1)   |
            gcseflg    | character varying(1)   |
            offsetl    | character varying(1)   |
            offsetr    | character varying(1)   |
            tnidf      | bigint                 |
            tnidt      | bigint                 |
            the_geom   | public.geometry        |


                                                Table "tiger.addr"
Column | Type | Modifiers -----------+-----------------------+------------------------------------
           -----------+-----------------------+-----------------------
            gid       | integer               | not null default
           nextval('public.addr_gid_seq'::regclass)
            tlid      | bigint                |
            fromhn    | character varying(12) |
            tohn      | character varying(12) |
            side      | character varying(1)  |
            zip       | character varying(5)  |
            plus4     | character varying(4)  |
            fromtyp   | character varying(1)  |
            totyp     | character varying(1)  |
            fromarmid | integer               |
            toarmid   | integer               |
            arid      | character varying(22) |
            mtfcc     | character varying(5)  |
            statefp   | character varying(2)  | not null
           _______________________________________________
           postgis-users mailing list
           postgis-users@postgis.refractions.net
        <mailto:postgis-users@postgis.refractions.net>
           <mailto:postgis-users@postgis.refractions.net
        <mailto:postgis-users@postgis.refractions.net>>

           http://postgis.refractions.net/mailman/listinfo/postgis-users


           _______________________________________________
           postgis-users mailing list
           postgis-users@postgis.refractions.net
        <mailto:postgis-users@postgis.refractions.net>
           <mailto:postgis-users@postgis.refractions.net
        <mailto:postgis-users@postgis.refractions.net>>

           http://postgis.refractions.net/mailman/listinfo/postgis-users



        ------------------------------------------------------------------------

        _______________________________________________
        postgis-users mailing list
        postgis-users@postgis.refractions.net
        <mailto:postgis-users@postgis.refractions.net>
        http://postgis.refractions.net/mailman/listinfo/postgis-users


    _______________________________________________
    postgis-users mailing list
    postgis-users@postgis.refractions.net
    <mailto:postgis-users@postgis.refractions.net>
    http://postgis.refractions.net/mailman/listinfo/postgis-users




--
Mark Vantzelfde
NetMasters, Inc.


------------------------------------------------------------------------

_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

Reply via email to