I've found a way to get beyond the 190 character restriction up to 767 or 768 characters which should be good enough for most URLs. Use the following options for a recent version of MySQL.
innodb_file_format=barracuda innodb_file_per_table=true innodb_large_prefix=true ROW_FORMAT=COMPRESSED For step by step instructions I've updated http://nlp.solutions.asia/?p=180. I have not had a chance to test extensively so let me know if you see issues. Regards James -----Original Message----- From: sumarlidason [mailto:[email protected]] Sent: Thursday, October 25, 2012 8:39 AM To: [email protected] Subject: RE: nutch/hadoop/solr I sent an email to Lewis with the following: When using MYSQL 5.5 w/ utf8mb4 the id column, a primary key, is restricted to 190 characters, which is insufficient. - Does the ID need to be the primary key? can it just be unique? I don't think unique has a restriction on character length, but I haven't confirmed this yet. - Could we use a hash of the url as the id instead of the url itself? If no one else is having this issue, i must be doing something wrong. -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-hadoop-solr-tp4014761p4015730.html Sent from the Nutch - User mailing list archive at Nabble.com.

