Hi I think I am bumping up against a problem with the street normalization routines in the Tiger geocoder, but I'm not sure.
version is from git, last commit Sat Aug 11 19:58:33 2012 In California, we have a lot of Spanish street names. For example there are names like Via Canon (that second n used to have a tilde I think), Via Verde, and Camino Las Ramblas. The geocode function wants to flip these Spanish names to the end, apparently. For example: geocoder=# select pprint_addy(addy), st_astext(geomout),rating FROM geocode( 'Via Verde, Dana Point CA'); pprint_addy | st_astext | rating -------------------------------------------------+-------------------------------------------+-------- Via Verde Ct, Calabasas, CA 91302 | POINT(-118.659995686466 34.1275841694006) | 41 Via Verde St, Covina, CA 91724 | POINT(-117.858043689933 34.0697638249141) | 41 ... But if you flip the name around, you get geocoder=# select pprint_addy(addy), st_astext(geomout),rating FROM geocode( 'Verde Via, Dana Point CA'); pprint_addy | st_astext | rating -----------------------------------+-------------------------------------------+-------- Verde Via, Dana Point, CA 92624 | POINT(-117.672816628784 33.4623777015046) | 38 Verde Vw, National City, CA 91950 | POINT(-117.060049181038 32.6573081053596) | 40 ... Similarly Camino Las Ramblas is a pretty major street, but you get: geocoder=# select pprint_addy(addy), st_astext(geomout),rating FROM geocode( 'Camino Las Ramblas, San Juan Capistrano CA'); pprint_addy | st_astext | rating ------------------------------------+-------------------------------------------+-------- Lago Cll, Dana Point, CA 92624 | POINT(-117.665451952078 33.4629500870658) | 60 Lago Cll, San Clemente, CA 92672 | POINT(-117.629773684804 33.4331286117088) | 68 ... But if you flip Camino to the end, you get: geocoder=# select pprint_addy(addy), st_astext(geomout),rating FROM geocode( 'Las Ramblas Camino, San Juan Capistrano CA'); pprint_addy | st_astext | rating ------------------------------------------------+-------------------------------------------+-------- Cam Las Ramblas, San Juan Capistrano, CA 92675 | POINT(-117.662978341711 33.4686616216608) | 38 Las Ramblas Dr, Concord, CA 94521 | POINT(-121.9494654272 37.9565518802366) | 53 (I just figured out that I am clouding the issue somewhat by using pprint_addy here, but still, the addy object has stripped the Cam part away from the 'Las Ramblas' part.) I took a scan of addrfeat, and see: geocoder=# select distinct fullname from addrfeat where fullname ~* 'Las Ramblas' limit 10; fullname ----------------- Cam Las Ramblas Via Las Ramblas Cll Las Ramblas Ave Las Ramblas Las Ramblas Dr Las Ramblas But again, if you try to geocode '28005 Cam Las Ramblas, San Juan Capistrano CA', the geocoder can't find it, but '28005 Las Ramblas Cam' has no troubles. Is this a bug, or a failing heuristic? Is there a way to turn that off for spanish names? Or perhaps better, is there a way to call whatever function is monkeying with the addrfeat.fullname strings to get the same effect on my input strings? That would mean apples compared to apples, which would give the best shot at matching. Thanks for any pointers. Regards, James Marca
pgpnPhRGbRSv3.pgp
Description: PGP signature
_______________________________________________ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users