Re: [GENERAL] problem converting database to UTF-8

2009-02-03 Thread Schwaighofer Clemens
On Fri, Jan 23, 2009 at 02:18, David Goodenough
david.goodeno...@btconnect.com wrote:

 Is there a definative HOWTO that I can follow, if not does someone
 have a set of instructions that will work?

 If it matters I am running under Debian.

I did it once for a very large db (large for me was 5GB) and converted
it from EUC to UTF8.

1) dumped all the data (pg_dump_all) on the source system so it was all EUC
2) split the file into manage able chunks (per LINE not per BYTE if
you work with multibyte things).
3) iconv -f EUC -t UTF8 -c (yes -c because there might be some strange
thing inside and so it doesn't stop)
4) put the files together again
5) sed to replace all EUC to UTF8
6) import into new created db on target system with all set to UTF8

-- 
[ Clemens Schwaighofer  -=:~ ]
[ IT Engineer/Manager]
[ E-Graphics Communications, TEQUILA\ Japan IT Group ]
[6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.jp  ]

Advertising Age Global Agency of the Year 2008
Adweek Global Agency of the Year 2008

This e-mail is intended only for the named person or entity to which 
it is addressed and contains valuable business information that is 
privileged, confidential and/or otherwise protected from disclosure.  
Dissemination, distribution or copying of this e-mail or the 
information herein by anyone other than the intended recipient, or 
an employee or agent responsible for delivering the message to the 
intended recipient, is strictly prohibited.  All contents are the 
copyright property of TBWA Worldwide, its agencies or a client of 
such agencies. If you are not the intended recipient, you are 
nevertheless bound to respect the worldwide legal rights of TBWA 
Worldwide, its agencies and its clients. We require that unintended 
recipients delete the e-mail and destroy all electronic copies in 
their system, retaining no copies in any media.If you have received 
this e-mail in error, please immediately notify us via e-mail to 
disclai...@tbwaworld.com.  We appreciate your cooperation.

We make no warranties as to the accuracy or completeness of this 
e-mail and accept no liability for its content or use.  Any opinions
expressed in this e-mail are those of the author and do not 
necessarily reflect the opinions of TBWA Worldwide or any of its 
agencies or affiliates. 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] problem converting database to UTF-8

2009-01-24 Thread Jasen Betts
On 2009-01-22, David Goodenough david.goodeno...@btconnect.com wrote:
 I have a database which was created as LATIN1 (the machine has the
 wrong locales installed when I set up  PG).  It is running 8.3.

 So I found various places which said the way to do this was to do
 a pg_dumpall -f dump_file, get rid of the entire database, init_db -E UTF-8, 
 and then psql -f dumpfile.

 But the psql fails saying:-

 psql:dumpfile:49: ERROR:  encoding LATIN1 does not match server's locale 
 en_GB.UTF-8
 DETAIL:  The server's LC_CTYPE setting requires encoding UTF8.

 I have en_GB.UTF-8 now as my primary locale, and en_GB.ISO8859-1 is
 also generated.

 So I looked around again and found people saying I needed to use iconv,
 but that does not help, I get the same error.

 Is there a definative HOWTO that I can follow, if not does someone
 have a set of instructions that will work?

simplest best way is to dump the individual databases and then restore
them individually.

else you need to go theough the dump file and doctor the
CREATE DATABASE lines to use UTF-8 (or not specify an encoding)
whilst making sure that CLIENT_ENCODING remains set to LATIN1
(so that the data from the dump file remains intelligible)

alternately you may be able to use iconv to convert the whole file to
UTF8 and also all ocurrances of LATIN1 in commands with UTF8

The frst thing I'd do is look in the dump file and see what's on line
49.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] problem converting database to UTF-8

2009-01-22 Thread David Goodenough
I have a database which was created as LATIN1 (the machine has the
wrong locales installed when I set up  PG).  It is running 8.3.

So I found various places which said the way to do this was to do
a pg_dumpall -f dump_file, get rid of the entire database, init_db -E UTF-8, 
and then psql -f dumpfile.

But the psql fails saying:-

psql:dumpfile:49: ERROR:  encoding LATIN1 does not match server's locale 
en_GB.UTF-8
DETAIL:  The server's LC_CTYPE setting requires encoding UTF8.

I have en_GB.UTF-8 now as my primary locale, and en_GB.ISO8859-1 is
also generated.

So I looked around again and found people saying I needed to use iconv,
but that does not help, I get the same error.

Is there a definative HOWTO that I can follow, if not does someone
have a set of instructions that will work?

If it matters I am running under Debian.

David

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] problem converting database to UTF-8

2009-01-22 Thread Vladimir Konrad
 Is there a definative HOWTO that I can follow, if not does someone
 have a set of instructions that will work?

What about running iconv command on the dumped .sql file and transform
it to the utf8?

Vlad

PS: man iconv for manual

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] problem converting database to UTF-8

2009-01-22 Thread David Goodenough
On Thursday 22 January 2009, Vladimir Konrad wrote:
  Is there a definative HOWTO that I can follow, if not does someone
  have a set of instructions that will work?

 What about running iconv command on the dumped .sql file and transform
 it to the utf8?

 Vlad

 PS: man iconv for manual

iconv does not change the database encodings embedded in the file
(and it is quite large).

Is there no automated procedure.

David

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] problem converting database to UTF-8

2009-01-22 Thread Vladimir Konrad
 iconv does not change the database encodings embedded in the file
 (and it is quite large).

Have you read the manual?

   file   A pathname of an input file. If no file operands are
   specified, or if a file operand is '-', the standard input shall
   be used.


cat the-source-dump.sql | iconv -t utf8 -  my-converted.sql

Size should not matter in this case...

V

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] problem converting database to UTF-8

2009-01-22 Thread Alan Hodgson
On Thursday 22 January 2009, Vladimir Konrad v...@dsl.pipex.com wrote:
  iconv does not change the database encodings embedded in the file
  (and it is quite large).

 Have you read the manual?

file   A pathname of an input file. If no file operands are
specified, or if a file operand is '-', the standard input shall
be used.


 cat the-source-dump.sql | iconv -t utf8 -  my-converted.sql

 Size should not matter in this case...

Yeah it does. iconv buffers everything in memory, as I recall.

However, you can split the file into manageable pieces, run each through 
iconv, and recombine afterwards.


-- 
Current Peeve: The mindset that the Internet is some sort of school for
novice sysadmins and that everyone -not- doing stupid dangerous things
should act like patient teachers with the ones who are. -- Bill Cole, NANAE 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] problem converting database to UTF-8

2009-01-22 Thread David Goodenough
On Thursday 22 January 2009, Vladimir Konrad wrote:
  iconv does not change the database encodings embedded in the file
  (and it is quite large).

 Have you read the manual?

file   A pathname of an input file. If no file operands are
specified, or if a file operand is '-', the standard input shall
be used.


 cat the-source-dump.sql | iconv -t utf8 -  my-converted.sql

 Size should not matter in this case...

 V

You have not understood what I said.  I ran iconv, and it changes the
encoding of the data, but not the ENCODING= statements that are
embedded in the datastream.  Yes I can change those with sed, but
I do not know what else I need to change.  There must be an easier 
way.

David

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] problem converting database to UTF-8

2009-01-22 Thread Vladimir Konrad

 You have not understood what I said.  I ran iconv, and it changes the
 encoding of the data, but not the ENCODING= statements that are
 embedded in the datastream.  Yes I can change those with sed, but
 I do not know what else I need to change.  There must be an easier 
 way.

Oops, please accept my apologies... Also, it looks that iconv tries to
buffer everything in RAM :-(.

Vlad

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] problem converting database to UTF-8

2009-01-22 Thread Alan Hodgson
On Thursday 22 January 2009, David Goodenough 
david.goodeno...@btconnect.com wrote:

 You have not understood what I said.  I ran iconv, and it changes the
 encoding of the data, but not the ENCODING= statements that are
 embedded in the datastream.  Yes I can change those with sed, but
 I do not know what else I need to change.  There must be an easier
 way.

There isn't. I'm pretty sure that iconv plus changing the ENCODING 
statement(s) is all I did when we converted, though.


-- 
Current Peeve: The mindset that the Internet is some sort of school for
novice sysadmins and that everyone -not- doing stupid dangerous things
should act like patient teachers with the ones who are. -- Bill Cole, NANAE 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] problem converting database to UTF-8

2009-01-22 Thread Daniel Verite

Alan Hodgson wrote:


Yeah it does. iconv buffers everything in memory, as I recall.

However, you can split the file into manageable pieces, run each 
through 

iconv, and recombine afterwards.


Another way is to just use GNU recode for large files instead of iconv. 
It's slower but doesn't need much memory, contrary to iconv that indeed 
seems to want the entire stream in memory before proceeding.


Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: 
http://www.manitou-mail.org


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] problem converting database to UTF-8

2009-01-22 Thread Vladimir Konrad

  cat the-source-dump.sql | iconv -t utf8 -  my-converted.sql
 
  Size should not matter in this case...
 
 Yeah it does. iconv buffers everything in memory, as I recall.

Just found an alternative - uconv command (part of ICU project):

http://www.icu-project.org/userguide/intro.html

(not sure if it is in debian, but it does not buffer entire file when
used in a pipe).

Vlad

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general