Hi, How do I setup nutch to crawl correctly using the UTF-8 character set?
This does not work: http://nlp.solutions.asia/?p=180 I am using nutch 2.1, Solr 4.0 and MySQL 5.5.30. This is the error during the parser job: Caused by: java.sql.SQLException: Incorrect string value: '\xEF\xBB\xBF Ir...' for column 'text' at row 1 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3609) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3541) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2002) The problem seems to be that the JDBC connection is not working on UTF-8. How do I change that in nutch? This is used but does not seem to effect the JDBC connection: <property> <name>parser.character.encoding.default</name> <value>utf-8</value> </property> Thanks for your help, Bart