Re: [Wikitech-l] correct way to import SQL dumps into MySQL database in terms of character encoding
On 1 April 2012 16:04, Piotr Jagielski piotr.jagiel...@op.pl wrote: mysql --user root --password=root wiki C:\Path\plwiki-20111227-categorylinks.sql --default-character-set=utf8 It's -p, not --password=root and it will prompt you for the password. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] correct way to import SQL dumps into MySQL database in terms of character encoding
These options should be equivalent. It does load the data using the below command. It just incorrectly handles non-English characters. Regards, Piotr On 2012-04-01 16:31, Svip wrote: On 1 April 2012 16:04, Piotr Jagielskipiotr.jagiel...@op.pl wrote: mysql --user root --password=root wiki C:\Path\plwiki-20111227-categorylinks.sql --default-character-set=utf8 It's -p, not --password=root and it will prompt you for the password. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] correct way to import SQL dumps into MySQL database in terms of character encoding
On 01/04/12 17:05, Piotr Jagielski wrote: These options should be equivalent. It does load the data using the below command. It just incorrectly handles non-English characters. Regards, Piotr Do you have $wgDBmysql5 set in your LocalSettings.php? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] correct way to import SQL dumps into MySQL database in terms of character encoding
I don't have MediaWiki installed. I'm just trying to import the dump into a standalone database so I can do some batch processing on the data. Regards, Piotr On 2012-04-01 17:30, Platonides wrote: On 01/04/12 17:05, Piotr Jagielski wrote: These options should be equivalent. It does load the data using the below command. It just incorrectly handles non-English characters. Regards, Piotr Do you have $wgDBmysql5 set in your LocalSettings.php? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] correct way to import SQL dumps into MySQL database in terms of character encoding
Piotr Jagielski piotr.jagiel...@op.pl wrote: Hello, set my data source URL to the following in my Java code: jdbc:mysql://localhost/plwiki?useUnicode=truecharacterEncoding=UTF-8 Please note you have plwiki here and you imported into wiki. Assuming your .my.cnf is not making things difficult I ran a small Jython script to test: $ jython Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) [OpenJDK 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0 Type help, copyright, credits or license for more information. from com.ziclix.python.sql import zxJDBC d, u, p, v = jdbc:mysql://localhost/wiki, root, None, org.gjt.mm.mysql.Driver db = zxJDBC.connect(d, u, p, v, CHARSET=utf8) c=db.cursor() c.execute(select cl_from, cl_to from categorylinks where cl_from=61 limit 10) c.fetchone() (61, array('b', [65, 110, 100, 111, 114, 97])) (a,b) = c.fetchone() print b array('b', [67, 122, -59, -126, 111, 110, 107, 111, 119, 105, 101, 95, 79, 114, 103, 97, 110, 105, 122, 97, 99, 106, 105, 95, 78, 97, 114, 111, 100, -61, -77, 119, 95, 90, 106, 101, 100, 110, 111, 99, 122, 111, 110, 121, 99, 104]) for x in b: ... try: ... print chr(x), ... except ValueError: ... print %02x % x, ... C z -3b -7e o n k o w i e _ O r g a n i z a c j i _ N a r o d -3d -4d w _ Z j e d n o c z o n y c h array('b, [ ... ]) in Jython means that SQL driver returns an array of bytes. It seems to me that array of bytes contains raw UTF-8, so you need to decode it into proper Unicode that Java uses in strings. I think this behaviour is described in http://bugs.mysql.com/bug.php?id=25528 Probably you need to play with getBytes() on a result object to get what you want. //Saper ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] correct way to import SQL dumps into MySQL database in terms of character encoding
Sorry, I made a mistake in the e-mail. I had the database set to the same name in both places. My problem is actually opposite because I don't get any result where I use UTF-8 string as an input in the query. But I verified that I don't get correct results where using the query you provided neither. The link with the MySQL bug report might be helpful in resolving the problem so thanks for providing it. Piotr On 2012-04-01 19:50, Marcin Cieslak wrote: Piotr Jagielskipiotr.jagiel...@op.pl wrote: Hello, set my data source URL to the following in my Java code: jdbc:mysql://localhost/plwiki?useUnicode=truecharacterEncoding=UTF-8 Please note you have plwiki here and you imported into wiki. Assuming your .my.cnf is not making things difficult I ran a small Jython script to test: $ jython Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) [OpenJDK 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0 Type help, copyright, credits or license for more information. from com.ziclix.python.sql import zxJDBC d, u, p, v = jdbc:mysql://localhost/wiki, root, None, org.gjt.mm.mysql.Driver db = zxJDBC.connect(d, u, p, v, CHARSET=utf8) c=db.cursor() c.execute(select cl_from, cl_to from categorylinks where cl_from=61 limit 10) c.fetchone() (61, array('b', [65, 110, 100, 111, 114, 97])) (a,b) = c.fetchone() print b array('b', [67, 122, -59, -126, 111, 110, 107, 111, 119, 105, 101, 95, 79, 114, 103, 97, 110, 105, 122, 97, 99, 106, 105, 95, 78, 97, 114, 111, 100, -61, -77, 119, 95, 90, 106, 101, 100, 110, 111, 99, 122, 111, 110, 121, 99, 104]) for x in b: ... try: ... print chr(x), ... except ValueError: ... print %02x % x, ... C z -3b -7e o n k o w i e _ O r g a n i z a c j i _ N a r o d -3d -4d w _ Z j e d n o c z o n y c h array('b, [ ... ]) in Jython means that SQL driver returns an array of bytes. It seems to me that array of bytes contains raw UTF-8, so you need to decode it into proper Unicode that Java uses in strings. I think this behaviour is described in http://bugs.mysql.com/bug.php?id=25528 Probably you need to play with getBytes() on a result object to get what you want. //Saper ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] correct way to import SQL dumps into MySQL database in terms of character encoding
On 01/04/12 17:37, Piotr Jagielski wrote: I don't have MediaWiki installed. I'm just trying to import the dump into a standalone database so I can do some batch processing on the data. Regards, Piotr It inserts the data fine for me. I suspect your java code is failing to appropiately read them. Try reading the table with a different tool, such as phpMyAdmin. mysql select * from categorylinks limit 20; +-+---+-+-+---+--+-+ | cl_from | cl_to | cl_sortkey | cl_timestamp| cl_sortkey_prefix | cl_collation | cl_type | +-+---+-+-+---+--+-+ | 0 | Ekspresowe_kasowanko | Golembiovski Andzey | 2009-07-09 21:01:30 | | | page| | 2 | Języki_skryptowe | AWK AWK | 2011-01-18 01:11:23 | Awk | uppercase| page| | 4 | Specjalności_lekarskie| ALERGOLOGIA | 2008-04-25 10:31:22 | | uppercase| page| | 6 | Formaty_plików_komputerowych | ASCII | 2011-09-23 11:01:05 | | uppercase| page| | 6 | Kodowania_znaków | ASCII | 2011-09-23 11:01:05 | | uppercase| page| | 7 | Artykuły_na_medal | ATOM | 2010-12-01 16:40:37 | | uppercase| page| | 7 | Artykuły_wymagające_dopracowania | ATOM | 2011-08-16 15:53:43 | | uppercase| page| | 7 | Atomy | ATOM | 2011-08-09 00:56:39 | | uppercase| page| | 8 | Logika_matematyczna | AKSJOMAT | 2007-11-10 08:18:06 | | uppercase| page| | 10 | Arytmetyka| ARYTMETYKA| 2011-10-17 02:36:39 | | uppercase| page| | 11 | Artykuły_pod_opieką_Projektu_Chemia | AMINOKWASY | 2011-08-19 02:48:21 | | uppercase| page| | 12 | Alkeny| * ALKENY| 2006-08-07 17:23:22 | * | uppercase| page| | 13 | Multimedia| ACTIVEX | 2007-05-24 20:20:15 | | uppercase| page| | 13 | Windows | ACTIVEX | 2007-05-24 20:20:15 | | uppercase| page| | 14 | Interfejsy_programistyczne| ! APPLICATION PROGRAMMING INTERFACE | 2011-04-27 11:33:17 | ! | uppercase| page| | 15 | Amiga | AMIGAOS | 2007-09-09 17:19:11 | | uppercase| page| | 15 | Systemy_operacyjne| AMIGAOS | 2007-09-09 17:19:11 | | uppercase| page| | 16 | Organizacje_międzynarodowe| ASSOCIATION FOR COMPUTING MACHINERY | 2011-10-19 15:52:28 | | uppercase| page| | 18 | Funkcje_boolowskie| ALTERNATYWA | 2007-03-23 17:43:05 | | uppercase| page| | 19 | Logika_matematyczna | AKSJOMAT INDUKCJI | 2007-08-31 22:54:55 | | uppercase| page| +-+---+-+-+---+--+-+ 20 rows in set (0.00 sec) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l