[SQL] Significance of Database Encoding
Hi , I would want to know what is the difference between databases that are created using UNICODE encoding and SQL_ASCII encoding. I have an existing database that has SQL_ASCII encoding but still i am able to store multibyte characters that are not in ASCII character set. for example: tradein_clients=# \l List of databases +-+--+---+ | Name | Owner | Encoding | +-+--+---+ | template0 | postgres | SQL_ASCII | | template1 | postgres | SQL_ASCII | | tradein_clients | tradein | SQL_ASCII | +-+--+---+ tradein_clients=# SELECT * from t_A; +--+ |a | +--+ | 私はガラス | +--+ Above is some japanese character. I have seen some posting regarding migrating databases from SQL_ASCII to UNICODE, given the above observation what significance does a migration have. Regards Rajesh Kumar Mallah. __ Yahoo! Mail Mobile Take Yahoo! Mail with you! Check email on your mobile phone. http://mobile.yahoo.com/learn/mail ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] Significance of Database Encoding [ update ]
I am not sure why the characters did not display properly in the mailling list archives. http://archives.postgresql.org/pgsql-sql/2005-05/msg00102.php but when i do the select in my screen (xterm -u8) i do see the japanese glyphs properly. Regds Mallah. --- Rajesh Mallah <[EMAIL PROTECTED]> wrote: > Hi , > > I would want to know what is the difference between databases > that are created using UNICODE encoding and SQL_ASCII encoding. > > I have an existing database that has SQL_ASCII encoding but > still i am able to store multibyte characters that are not > in ASCII character set. for example: > > tradein_clients=# \l > > List of databases > +-+--+---+ > | Name | Owner | Encoding | > +-+--+---+ > | template0 | postgres | SQL_ASCII | > | template1 | postgres | SQL_ASCII | > | tradein_clients | tradein | SQL_ASCII | > +-+--+---+ > > tradein_clients=# SELECT * from t_A; > +--+ > |a > | > +--+ > | 私はガラス > > > | > +--+ > > Above is some japanese character. > > I have seen some posting regarding migrating databases from > SQL_ASCII to UNICODE, given the above observation what > significance does a migration have. > > Regards > > Rajesh Kumar Mallah. > > > > > > > > __ > Yahoo! Mail Mobile > Take Yahoo! Mail with you! Check email on your mobile phone. > http://mobile.yahoo.com/learn/mail > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings > Discover Yahoo! Find restaurants, movies, travel and more fun for the weekend. Check it out! http://discover.yahoo.com/weekend.html ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] Significance of Database Encoding
+--+ | 私はガラス +--+ You say it displays correctly in xterm (ie. you didn't see these in your xterm). There are HTML/XML unicode character entities, probably generated by your mailer from your Unicode cut'n'paste. Using SQL ASCII to store UTF8 encoded data will work, but postgres won't know that it's manipulating multibyte characters, so for instance the length of a string will be its Byte length instead of correctly counting the characters, collation rules will be funky, etc. And substring() may well cut in the middle of an UTF8 multibyte char which will then screw your application side processing... Apart from that, it'll work ;) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] Significance of Database Encoding
--- PFC <[EMAIL PROTECTED]> wrote: > > > +--+ > > | 私はガラス > > +--+ > > You say it displays correctly in xterm (ie. you didn't see these in > your > xterm). > There are HTML/XML unicode character entities, probably generated by > your > mailer from your Unicode cut'n'paste. That is correct. Now the question is how to convert from SQL_ASCII to UNICODE. Mailing lists suggests to run recode or iconv on the dump file and restore. The problem is on running iconv with -f US-ASCII the program aborted: $ iconv -f US-ASCII -t UTF-8 < test.sql > out.sql iconv: illegal input sequence at position 114500 Any ideas how the job can be accomplised reliably. Also my database may contain data in multiple encodings like WINDOWS-1251 and WINDOWS-1256 in various places as data has been inserted by different peoples using different sources and client software. Regds Rajesh Kumar Mallah. > Using SQL ASCII to store UTF8 encoded data will work, but postgres > won't > know that it's manipulating multibyte characters, so for instance the > length of a string will be its Byte length instead of correctly counting > the characters, collation rules will be funky, etc. And substring() may > well cut in the middle of an UTF8 multibyte char which will then screw > your application side processing... > Apart from that, it'll work ;) > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] Significance of Database Encoding
$ iconv -f US-ASCII -t UTF-8 < test.sql > out.sql iconv: illegal input sequence at position 114500 Any ideas how the job can be accomplised reliably. Also my database may contain data in multiple encodings like WINDOWS-1251 and WINDOWS-1256 in various places as data has been inserted by different peoples using different sources and client software. You could use a simple program like that (in Python): output = open( "unidump", "w" ) for line in open( "your dump" ): for encoding in "utf-8", "iso-8859-15", "whatever": try: output.write( unicode( line, encoding ).encode( "utf-8" )) break except UnicodeError: pass else: print "No suitable encoding for line..." I'd say this might work, if UTF-8 cannot absorb an apostrophe inside a multibit character. Can it ? Or you could do that to all your table using SELECTs but it's going to be painful... ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] Significance of Database Encoding
--- PFC <[EMAIL PROTECTED]> wrote: > > > $ iconv -f US-ASCII -t UTF-8 < test.sql > out.sql > > iconv: illegal input sequence at position 114500 > > > > Any ideas how the job can be accomplised reliably. > > > > Also my database may contain data in multiple encodings > > like WINDOWS-1251 and WINDOWS-1256 in various places > > as data has been inserted by different peoples using > > different sources and client software. > > You could use a simple program like that (in Python): > > output = open( "unidump", "w" ) > for line in open( "your dump" ): > for encoding in "utf-8", "iso-8859-15", "whatever": > try: > output.write( unicode( line, encoding ).encode( "utf-8" > )) > break > except UnicodeError: > pass > else: > print "No suitable encoding for line..." This may not work . Becuase ,conversion to utf-8 can be successfull (no runtime error) even for an incorrect guess of the original encoding but the result will be an incorrect utf8. Regds Rajesh Kumar Mallah > > I'd say this might work, if UTF-8 cannot absorb an apostrophe inside a > multibit character. Can it ? > > Or you could do that to all your table using SELECTs but it's going to > be > painful... > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings > __ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq