Okay, sounds like you may actually need it. I've updated the information at 
http://nlp.solutions.asia/?p=180 to use utf8mb4. If you could use that with 
MySQL 5.5 or above and see if it helps. It is changed in three places -- the db 
server config, the db creation and the table creation. 

-----Original Message-----
From: sumarlidason [mailto:[email protected]] 
Sent: Wednesday, October 24, 2012 9:31 AM
To: [email protected]
Subject: RE: nutch/hadoop/solr

Actually, that is the tutorial I followed.

I'm still getting these errors.. this string, \xF0\x9F\x92\x83, is actually 
this character: 
I assume thats where the issue is. However I am unable to reproduce the error 
when manually inserting via /usr/bin/mysql.

I read this article,
http://mzsanford.wordpress.com/2010/12/28/mysql-and-unicode/, he suggests that 
utf8_bin might resolve the issue. Other forums suggest that even though the 
default charset is set, the column charset has to be specifically set as well.

I can't get passed the fact that MySQL pre 5.5 is only storing 1-3Bytes UTF 
instead of 1-4Bytes.


j.sullivan wrote
> Sumarlidason
> 
> Hi
> 
> The need to use utf8mb4 for web crawling should be fairly rare. If you 
> are using MySQL 5.5 or later and have a set up like this
> http://nlp.solutions.asia/?p=180 you should be fine. 
> 
> James





--
View this message in context: 
http://lucene.472066.n3.nabble.com/nutch-hadoop-solr-tp4014761p4015480.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to