Hi,

How do I setup nutch to crawl correctly using the UTF-8 character set?

This does not work: http://nlp.solutions.asia/?p=180

I am using nutch 2.1, Solr 4.0 and MySQL 5.5.30. This is the error during the 
parser job:

Caused by: java.sql.SQLException: Incorrect string value: '\xEF\xBB\xBF Ir...' 
for column 'text' at row 1
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3609)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3541)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2002)

The problem seems to be that the JDBC connection is not working on UTF-8. How 
do I change that in nutch? This is used but does not seem to effect the JDBC 
connection:

<property>
        <name>parser.character.encoding.default</name>
        <value>utf-8</value>
</property>


Thanks for your help,

Bart

Reply via email to