Mark Vantzelfde wrote:
Can anyone comment on the accuracy of the Tiger geocoder vs MapMarker?
I can't comment specifically on these, but in general I have done a lot
of geocoder analysis and I have found that the best way seems to be to
get a large list of addresses that are typical for your needs. The
typical for your needs, is the important part because address lists can
be notoriously bad in my experience. So having typical addresses of what
you will be feeding it are important to get reasonable measurements.
Large is in 20K range for quick checks and 100-250K records for a more
through test.
Then run this list to make a baseline and verify the results, either via
random samples, or whatever makes sense for your needs. Now you can run
this list against other geocoders and get stats.
One way to do this is to have each record already geocoded like:
address, city, state, postalcode, lat, long
When you use this record as input you can compare the distance of the
result to this location and use that as a measure of accuracy. Then just
compute some statistics on the results to access if the new geocoder is
better or worse. You can also slice and dice the results by state or
region, because I have seen that some areas have significantly more
problems than other area depending on the data set being used.
It is just as important to look at the failures as the successful
geocodes and to understand if a success is really that and why a given
address failed, bad input, bad data, algorithm limitation or bug, etc.
-Steve
Thanks
Mark
On Tue, Mar 2, 2010 at 11:40 AM, Stephen Woodbridge
<wood...@swoodbridge.com <mailto:wood...@swoodbridge.com>> wrote:
Hi Kevin,
I have worked with the Tiger data for about 10 years now. The recent
improvements in tiger are really great to see, but not without their
own set of issues. Tiger has a lot of known limitations based on the
rules, regs and requirements of the US Census. The recent work has
georectified the street data and added lots of new streets based on
digitizing high-res satellite imagery. but that does not let you
read the street names so they are added after the fact. There are a
lot of street segments that do not have names. We can only hope that
these will be added over time. Because of non-disclosure, address
ranges can be weird also. Many small streets have address ranges
1-100 encoded on them, in spite of the fact that the real address
ranges only run from 1-20. This has the effect of skewing all the
locations to the front end of the street.
Because language is ambiguous and typos and sounds-like errors,
fuzzy searching is employed. Most geocoders do some form of fuzzy
searching so you often run into the Main St vs Main Ln issue or you
find W Main St when you are search for E Main St.
When a geocoder says "Found it!", you need to be prepared to say
Found What? or be tolerant to mis-geocodes. I like geocoders the
score the results and return them in ranked order.
In general a geocoder can never be better than its data and can in
fact be much worse than its data. Fuzzy searching lets you find
possible candidates in the data that might not have been encoded
correctly in either the input address or the data address, but with
the uncertainty that this is the actual location wanted or not.
You might also want to look at PAGC Geocoder. It is written in C and
uses some statistical matching techniques which are very good, There
are some change in one of the branches that let you load all the
Tiger data for the US.
http://www.pagcgeo.org/
-Steve
Kevin Galligan wrote:
I actually bought an early access copy of the book. I work in
linux and have been playing around with different geocoders and
the tiger files. Most recently with a ruby geocoder, for no
other reason than I'm trying to find one that is fairly complete
and functional.
Any idea how "production quality" this particular one is? If
its fairly high, I'll probably put some time in to get it
working on linux. I have the full 2009 tiger dataset on an EC2
block drive, waiting to import into a different database.
Right now I'm using zip+4 data to get a rough geocode, which is
good enough for what we're doing, but it only gets 92% of our
non-PO Box data. From my experience with the tiger data, it
only adds a couple percent at most above that, but the geocoders
I've used have been pretty hacky, so its possible that was the
issue. Also, some of them seem to not be concerned with stuff
like matching "Main St" when you're looking for "Main Ln", which
is pretty terrible.
On the plus side, if there is major work going on with this
geocoder (or any tiger geocoder), I have a huge national data
volume that will help stress test the system.
Recently I've been toying with USC's free geocoder project. In
some areas it actually gets about half of the data I previously
could not, which is impressive.
The really frustrating thing is, in general, the first 90% is
cheap/free. The next 3-4% is marginally expensive. The rest is
really pricey.
Is there any idea how complete the tiger data is, and why there
is this apparent lack of data in there? I find it strange.
Some streets are just missing. Stuff like that.
Rambling. Anyway, will take a look later. Thoughts on the
quality of the geocoder appreciated.
-Kevin
On Fri, Feb 26, 2010 at 11:52 PM, Paragon Corporation
<l...@pcorp.us <mailto:l...@pcorp.us> <mailto:l...@pcorp.us
<mailto:l...@pcorp.us>>> wrote:
David,
As a matter of fact we've been working on that for chapter 10
of our
upcoming book and think we have it all working. As a part of the
example
generation process for our chapter 10, we had to come up with
a way
to load
the tables that works on both windows and Linux.
Unfortunately we
haven't
had a chance to test the Linux loading approach, but is
pretty much a
parallel of the windows approach.
To do so we started out with Steve's code, added some additional
skeleton
tables and a database function that generates a command line
script
for the
respective OS. Hopefully it all makes sense from the readme
file we
have
packaged.
We also changed one of the functions because there was an
error in
it and
revised slightly to work with Tiger 2009 data. You can
dowload our
slightly
hacked version of Steve's code from our chapter 10 page.
Steve -- if you are listening we are hoping to remerge your
version
with our
loader part and bring back into the PostGIS distribution as
part of
PostGIS
1.5.1 or 2.0 release.
http://www.postgis.us/chapter_10
Leo and Regina
http://www.postgis.us/
-----Original Message-----
From: postgis-users-boun...@postgis.refractions.net
<mailto:postgis-users-boun...@postgis.refractions.net>
<mailto:postgis-users-boun...@postgis.refractions.net
<mailto:postgis-users-boun...@postgis.refractions.net>>
[mailto:postgis-users-boun...@postgis.refractions.net
<mailto:postgis-users-boun...@postgis.refractions.net>
<mailto:postgis-users-boun...@postgis.refractions.net
<mailto:postgis-users-boun...@postgis.refractions.net>>] On
Behalf Of
Dave
Fuhry
Sent: Friday, February 26, 2010 3:04 PM
To: PostGIS Users Discussion
Subject: [postgis-users] TIGER geocoder with Census 2009
shapefiles
I'm trying to set up the TIGER geocoder from
http://www.snowman.net/git/tiger_geocoder/ which is new and
aims to work
with the new TIGER shapefiles. I'm trying with the 2009
shapefiles from
www2.census.gov/geo/tiger/TIGER2009/
<http://www2.census.gov/geo/tiger/TIGER2009/>
<http://www2.census.gov/geo/tiger/TIGER2009/>.
I'm not sure how to create the roads_local table (derived
closely from
completechain in the old version). A join between edges and
addr?
Wondering if anyone can offer any direction. A relevant
ticket is
http://trac.osgeo.org/postgis/ticket/135. The out-of-date file
which used
to create the roads_local table is tables/roads_local.sql, in
the above
repository.
-Dave
Table "tiger.edges"
Column | Type |
Modifiers
------------+------------------------+----------------------------------
------------+------------------------+--------------------------
gid | integer | not null default
nextval('public.edges_gid_seq'::regclass)
statefp | character varying(2) |
countyfp | character varying(3) |
tlid | bigint |
tfidl | bigint |
tfidr | bigint |
mtfcc | character varying(5) |
fullname | character varying(100) |
smid | character varying(22) |
lfromadd | character varying(12) |
ltoadd | character varying(12) |
rfromadd | character varying(12) |
rtoadd | character varying(12) |
zipl | character varying(5) |
zipr | character varying(5) |
featcat | character varying(1) |
hydroflg | character varying(1) |
railflg | character varying(1) |
roadflg | character varying(1) |
olfflg | character varying(1) |
passflg | character varying(1) |
divroad | character varying(1) |
exttyp | character varying(1) |
ttyp | character varying(1) |
deckedroad | character varying(1) |
artpath | character varying(1) |
persist | character varying(1) |
gcseflg | character varying(1) |
offsetl | character varying(1) |
offsetr | character varying(1) |
tnidf | bigint |
tnidt | bigint |
the_geom | public.geometry |
Table "tiger.addr"
Column | Type |
Modifiers
-----------+-----------------------+------------------------------------
-----------+-----------------------+-----------------------
gid | integer | not null default
nextval('public.addr_gid_seq'::regclass)
tlid | bigint |
fromhn | character varying(12) |
tohn | character varying(12) |
side | character varying(1) |
zip | character varying(5) |
plus4 | character varying(4) |
fromtyp | character varying(1) |
totyp | character varying(1) |
fromarmid | integer |
toarmid | integer |
arid | character varying(22) |
mtfcc | character varying(5) |
statefp | character varying(2) | not null
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
<mailto:postgis-users@postgis.refractions.net>
<mailto:postgis-users@postgis.refractions.net
<mailto:postgis-users@postgis.refractions.net>>
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
<mailto:postgis-users@postgis.refractions.net>
<mailto:postgis-users@postgis.refractions.net
<mailto:postgis-users@postgis.refractions.net>>
http://postgis.refractions.net/mailman/listinfo/postgis-users
------------------------------------------------------------------------
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
<mailto:postgis-users@postgis.refractions.net>
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
<mailto:postgis-users@postgis.refractions.net>
http://postgis.refractions.net/mailman/listinfo/postgis-users
--
Mark Vantzelfde
NetMasters, Inc.
------------------------------------------------------------------------
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users