If it is the ID field on MySQL let me share my experience. I did not change that for a reason. The ID field is a primary key and has a rather small size limit due to MySQL limitations (I think it is 1000 bytes). Even for Japanese language sites where I do most of my crawling all of the URLs I am interested in are Ascii so in my case there wasn't the need.
If you do need to change it (for non-English URLs) use varchar not char because if you use char with UTF8 or utf8mb4 you will allocate 3 or 4 bytes respectively per character regardless of what the character actually needs and you will find long urls causing errors as they exceed the limit you set or the 1000 byte absolute limit. If you use varchar it will allocate the needed amount of bytes (1 per English character more for characters in other languages). That doesn't completely rule out running into a URL that takes more than 1000 bytes but makes it whole lot less likely as the majority of urls use English characters. I will look at changing it in my example but it may be a bit later as I think it needs some testing to make sure I don't cause problems. If you are crawling websites with UR -----Original Message----- From: sumarlidason [mailto:[email protected]] Sent: Thursday, October 25, 2012 12:48 AM To: [email protected] Subject: RE: nutch/hadoop/solr err, manager pointed out that its the ID field complaining now.. so attempting to change the collation there as well. -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-hadoop-solr-tp4014761p4015626.html Sent from the Nutch - User mailing list archive at Nabble.com.

