I've found a way to get beyond the 190 character restriction up to 767 or 768 
characters which should be good enough for most URLs. Use the following options 
for a recent version of MySQL. 

innodb_file_format=barracuda
innodb_file_per_table=true
innodb_large_prefix=true
ROW_FORMAT=COMPRESSED

For step by step instructions I've updated http://nlp.solutions.asia/?p=180. 

I have not had a chance to test extensively so let me know if you see issues.

Regards

James


-----Original Message-----
From: sumarlidason [mailto:[email protected]] 
Sent: Thursday, October 25, 2012 8:39 AM
To: [email protected]
Subject: RE: nutch/hadoop/solr

I sent an email to Lewis with the following:

When using MYSQL 5.5 w/ utf8mb4 the id column, a primary key, is restricted to 
190 characters, which is insufficient.

- Does the ID need to be the primary key? can it just be unique? I don't think 
unique has a restriction on character length, but I haven't confirmed this yet.

- Could we use a hash of the url as the id instead of the url itself?

If no one else is having this issue, i must be doing something wrong. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/nutch-hadoop-solr-tp4014761p4015730.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to