Chris,

On the MySQL database, utf8 has a width of 1-3 (which restricts length to 255), 
which throws exceptions like this one on 4 byte characters,
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x92\x83' for column 
'id' at row 1

More info on the MySQL encoding issue,
http://mathiasbynens.be/notes/mysql-utf8mb4
http://mzsanford.wordpress.com/2010/12/28/mysql-and-unicode/

Database settings used for test:
CREATE DATABASE nutch DEFAULT CHARACTER SET utf8 DEFAULT COLLATE 
utf8_unicode_ci;
CREATE TABLE `webpage` (
`id` varchar(255) CHARACTER SET utf8 NOT NULL,
...
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

[mysqld]
...
collation_server=utf8_unicode_ci
character_set_server=utf8
…

thank you for your time,

Arni Sumarlidason | Web Developer, Information Technology
MDA | 820 West Diamond Ave | Gaithersburg, MD | USA
[email protected]<mailto:[email protected]> | 
http://www.mdaus.com

On Oct 24, 2012, at 11:37 PM, "Mattmann, Chris A (388J)" 
<[email protected]<mailto:[email protected]>>
 wrote:

Hi Arni,

On Oct 24, 2012, at 7:51 PM, Arni Sumarlidason wrote:

Good Day, Thank you for reading,

I'm working with nutch using the org.hsqldb.jdbc.JDBCDriver connector. I'm 
coming across urls with unicode characters, which is causing the jdbc connector 
to throw exceptions when inserting into non-utf formatted columns. With latin1 
encoding the id column can have a length of 767 characters. Switching the 
encoding to utf8mb4 resolves the issue, but at great cost, now the max length 
is 190 characters, or ~767 bytes per primary key/unique key constraints on the 
MySQL database.

Is there any reason you can't use utf8 columns with MySQL?

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]<mailto:[email protected]>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Reply via email to