On Fri, Sep 2, 2011 at 9:30 PM, Rural Hunter <ruralhun...@gmail.com> wrote:
> Hi Kevin,
>
> I did another try with following additional changes based on our discussion:
> 1. use the tcp connection
> 2. turn off autovacuum
> 3. turn off full_page_writes
>
> I could import more than 30G data in about 2 hours. That's totally
> acceptable performance to me with the current server capability.  There is a
> minor issue though. I saw a few errors during the import:
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe6272c
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe68e27
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe7272c
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe5272c
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe5a427
>
> My data was exported from an UTF8 MySQL database and my pgsql db is also
> UTF8. I got 8 errors above only with about 3 million records imported. The
> strange thing is, I usually see the problematic SQL output in the log if
> there is any error for that SQL so I have a chance to fix the data manually.
> But for the errors above, I don't see any SQL logged. The pgsql log just
> output error log same as above with no additional info:
> 2011-09-01 11:26:32 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe6272c
> 2011-09-01 11:26:47 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe5272c
> 2011-09-01 11:26:53 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe5272c
> 2011-09-01 11:26:58 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe5272c
> 2011-09-01 11:26:58 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe68e27
> 2011-09-01 11:27:01 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe7272c
> 2011-09-01 11:27:06 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe5272c
> 2011-09-01 11:27:15 CST ERROR:  invalid byte sequence for encoding "UTF8":
> 0xe5a427
>
> What could be the cause of that?

MySQL probably has looser checking of proper UTF-8 encodings.

-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Reply via email to