Chris, On the MySQL database, utf8 has a width of 1-3 (which restricts length to 255), which throws exceptions like this one on 4 byte characters, java.sql.SQLException: Incorrect string value: '\xF0\x9F\x92\x83' for column 'id' at row 1
More info on the MySQL encoding issue, http://mathiasbynens.be/notes/mysql-utf8mb4 http://mzsanford.wordpress.com/2010/12/28/mysql-and-unicode/ Database settings used for test: CREATE DATABASE nutch DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_unicode_ci; CREATE TABLE `webpage` ( `id` varchar(255) CHARACTER SET utf8 NOT NULL, ... PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; [mysqld] ... collation_server=utf8_unicode_ci character_set_server=utf8 … thank you for your time, Arni Sumarlidason | Web Developer, Information Technology MDA | 820 West Diamond Ave | Gaithersburg, MD | USA [email protected]<mailto:[email protected]> | http://www.mdaus.com On Oct 24, 2012, at 11:37 PM, "Mattmann, Chris A (388J)" <[email protected]<mailto:[email protected]>> wrote: Hi Arni, On Oct 24, 2012, at 7:51 PM, Arni Sumarlidason wrote: Good Day, Thank you for reading, I'm working with nutch using the org.hsqldb.jdbc.JDBCDriver connector. I'm coming across urls with unicode characters, which is causing the jdbc connector to throw exceptions when inserting into non-utf formatted columns. With latin1 encoding the id column can have a length of 767 characters. Switching the encoding to utf8mb4 resolves the issue, but at great cost, now the max length is 190 characters, or ~767 bytes per primary key/unique key constraints on the MySQL database. Is there any reason you can't use utf8 columns with MySQL? Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected]<mailto:[email protected]> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

