Good Day, Thank you for reading, I'm working with nutch using the org.hsqldb.jdbc.JDBCDriver connector. I'm coming across urls with unicode characters, which is causing the jdbc connector to throw exceptions when inserting into non-utf formatted columns. With latin1 encoding the id column can have a length of 767 characters. Switching the encoding to utf8mb4 resolves the issue, but at great cost, now the max length is 190 characters, or ~767 bytes per primary key/unique key constraints on the MySQL database.
That being said, my question is this, what are the repercussions of removing the primary key constraint? Does nutch/gora use the constraint to prevent duplicates from being inserted? That seems to be the obvious strategy. If that is the case, we should redesign using a hash of the url and store the url in a larger data type? please assist / advise, thank you for your time, Arni Sumarlidason | Web Developer, Information Technology MDA | 820 West Diamond Ave | Gaithersburg, MD | USA [email protected] | http://www.mdaus.com

