Don, Okay so it looks like your shp2pgsql is before the fix was in place which explains why it doesn't work for you since your version doesn't have the bug fix. Is there any way you can get a newer shp2pgsql (e.g. recompile the latest tarball of PostGIS) or use the one from the 1.5 release? That sounds like probably the easiest way to go. As far as files in your postgres directories, the ones in the pg_log folder (end in .log) you can safely delete since those are just sql log errors etc which depending on the verbosity of your error in postgresql.conf could get huge. The other folder ones you don't want to touch since they'll damage your cluster. Thanks for testing this out on Linux. I'll strip out all the windows line breaks we accidentally introduced in the code and look forward to any patches you can offer. Thanks, Regina _____
From: [email protected] [mailto:[email protected]] On Behalf Of Don Sent: Wednesday, April 13, 2011 5:40 AM To: PostGIS Users Discussion Subject: Re: [postgis-users] "Linux" geocoder script ? Here is the version: RELEASE: 2.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1 (r$Id: shp2pgsql-core.h 6358 2010-12-13 20:09:26Z pramsey $) USAGE: shp2pgsql [<options>] <shapefile> [<schema>.]<table> I had tried the iconv fix which is a great idea. It seemed to work, but I ran out of disk space. $PGBIN/shp2pgsql -c -s 4269 -g the_geom -W "latin1" tl_2010_27_county10.dbf tiger_staging.mn_county10 | iconv -f latin1 -t UTF8 | $PGBIN/psql -d $PGDATABASE So I have been spending time on creating more disk space instead of fixing this. I would like to share the patches that I have so that others can try to improve it especially those who are more familiar with the tiger2010 file formats. It would be nice to have a working tiger2010 decoder for linux in the next postgis release. I don't remember offhand which state had the utf problem. Several of them did. I would run ./tiger_load >& del with a "set -x" in the file for debugging. (You can then search that file for "aborted" and backtrack and find out which file it was processing.) It got rather large and so did my log files. I was trying to get more info on this when I ran out of disk space. I could not even vaccuum any databases. I see many large files in my postgres directories and wonder what they are all for. I have been using 2 different database clusters one on a new larger drive for the geocoder. It seems that my cluster on the small drive has a lot of files on it thought that are taking up a lot of space. On 04/13/2011 01:32 AM, Paragon Corporation wrote: Don, Which state were you processing? I can check it out and see if I get similar errors on my shp2pgsql. You could be right and the file just isn't Latin1. The regress test did seem to pass for me once that ticket was fixed. Also to confirm you are running the latest version of shp2pgsql If you run shp2pgsql from commandline, it should output the version. Mine for example reads RELEASE: 2.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1 (r$Id: shp2pgsql-core.h 6925 2011-03-18 16:24:33Z pramsey $) The version unfortunately isn't quite accurate since its evidentally looking at the .h file instead of .c file. So though my version says 6925, its really 6932 or later. http://trac.osgeo.org/postgis/changeset/6932 Hope that helps, Regina http://www.postgis.us _____ From: [email protected] [mailto:[email protected]] On Behalf Of Don Sent: Tuesday, April 12, 2011 3:08 AM To: PostGIS Users Discussion Subject: Re: [postgis-users] "Linux" geocoder script ? My database is encoded as geocoder | drh | UTF8 | C | en_US.UTF-8 | . All my shp2pgsql statements have the -W option like this. ${loader} -a -s 4269 -g the_geom -W "latin1" $z ${staging_schema}.${state_abbrev}_${table_name} | $PGBIN/psql -d $PGDATABASE; Here is the bug that I was referring to. http://trac.osgeo.org/postgis/ticket/808 In one case I had a very large number of inserts processed for the shape file and then got that error. >From your link it says: "To enable automatic character set conversion, you have to tell PostgreSQL the character set (encoding) you would like to use in the client. There are several ways to accomplish this: " Perhaps I need to use SET CLIENT_ENCODING TO 'value'; in psql or is shp2pgsql supposed to do that when I use the -W option? postgis is expecting utf-8 when it should be expecting latin1 and converting it to utf-8. Could data type for a column have some effect on this? On 04/11/2011 08:52 PM, Sylvain Racine wrote: Hello, This is not a shp2pgsql bug. You get this error when you try to insert string data in PostgreSQL from another encoding that the one of your database Ex: Your data is formatted in Latin1 (ISO-8859-1) and you insert them in a UTF-8 database. To fix the error message, you need to convert your data. PostgreSQL have a internal converter. shp2pgsql have it too. Try shp2pgsql -W <encoding> where <encoding> is the format of you DBase file .dbf. This is called the "client encoding" in PostgreSQL. See list of valid encoding type: http://www.postgresql.org/docs/9.0/static/multibyte.html Don't mix it with the database encoding. It is the one you us to create your databse. There is also a default database charset, depending of your OS. It is the one you use to create template1 database in init-db. Mine is "UTF8" on Ubuntu. Hope that this information will help you Regards Sylvain Racine On 2011-04-11 21:22, Don wrote: I have got the tiger2010 geodecoder to work on my Opensuse system. geocoder=# geocoder=# SELECT g.rating, geocoder-# ST_X(geomout) As lon, geocoder-# ST_Y(geomout) As lat, (addy).* geocoder-# FROM geocode('1731 New Hampshire Avenue Northwest, Washington, DC 20010') As g; rating | lon | lat | address | predirabbrev | streetname | streettypeabbrev | postdirabbrev | internal | location | stateabbrev | zip | parsed --------+-------------------+------------------+---------+--------------+--- ------------+------------------+---------------+----------+------------+---- ---------+-------+-------- 0 | -77.0399013800607 | 38.9134181361424 | 1731 | | New Hampshire | Ave | NW | | Washington | DC | 20009 | t (1 row) There are a few glitches. I noticed that I am getting this message sometimes. INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 ERROR: invalid byte sequence for encoding "UTF8": 0xed6f20 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". ERROR: current transaction is aborted, commands ignored until end of transaction block ERROR: current transaction is aborted, commands ignored until end of transaction block ERROR: current transaction is aborted, commands ignored until end of transaction block I researched this some and it appears to be a shp2pgsql bug. But I am using postgis-utils-2.0.0SVN-1.2.x86_64 postgis-2.0.0SVN-1.2.x86_64 where this has supposedly been fixed. Or could the census data be corrupted? So I have "lost" some of the data due to this error. I had problems with psql generating ctrl-m instead of \n which would really mess up the script when it ran. So after I generated my load tiger script I ran this command tr "\r" "\n" < load_tiger > load_tiger2 _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users
