On 25 April 2012 14:59, Robert Newson <[email protected]> wrote: > It sounds like SQLToNoSQLImporter is not converting your data > correctly. As it's Java, I would take a wild guess and assume the > characters to bytes translation is being done with the platform > default rather than "UTF-8". Since UTF-8 is the default encoding for > JSON strings, that would be a pretty big oversight. > > B. > > On 25 April 2012 11:59, Paulo Carvalho <[email protected]> wrote: >> Hello, >> >> I am trying SQLToNoSQLImporter to import data to a couchDB database >> from a Postgresql database. >> >> I configured correctly the import.properties and db-data-config files. >> >> When I execute run.bat command (I am using windows), I get the >> following result: >> >> 07:50:14,568 INFO DataImporter:134 - Data Configuration loaded >> successfully >> 07:50:18,477 ERROR DataImporter:178 - ***** Data import failed. >> ********** >> Reason is : >> org.apache.http.HttpException: HTTP/1.1 400 Bad Request >> at >> net.sathis.export.sql.couch.CouchWriter.post(CouchWriter.java:68) >> at >> net.sathis.export.sql.couch.CouchWriter.writeToNoSQL(CouchWriter.java: >> 52) >> at net.sathis.export.sql.DocBuilder.execute(DocBuilder.java: >> 142) >> at >> net.sathis.export.sql.DataImporter.doFullImport(DataImporter.java:174) >> at >> net.sathis.export.sql.DataImporter.doDataImport(DataImporter.java:93) >> at >> net.sathis.export.sql.SQLToNoSQLImporter.main(SQLToNoSQLImporter.java: >> 19) >> >> As you can see, the configuration file is loaded correctly. In the >> couchDB database log file, I get the following error: >> >> [debug] [<0.147.0>] Invalid JSON: {{error, >> {126, >> "lexical error: invalid bytes >> in UTF8 string.\n"}}, >> <<"{\"docs\":[{\"_id\":\"0\",\"label >> \":\"Pas de taches\"},{\"_id\":\"1\",\"description\":\"Le pourcentage >> de recouvrement est < 2 %\",\"label\":\"Très peu nombreuses\"},{\"_id >> \":\"2\",\"description\":\"Le p....... >> >> I think the problem happens because the text contained in the table >> has special characters ("è", etc.). >> >> The postgresql database is coded in UTF-8. >> >> >> Trying to solve the problem, I have written a little JSON file and i tried >> to insert it on my database. My JSON file content was the following: >> {"docs":[{"_id":"0","label ":"Pas de taches"}]} >> >> The result of inserting it on my database was: The result was: >> {"ok":true,"id":"doc_id","rev":"1- ffaec7bc2aa548ca8e5a9c697ea3eb64"} >> >> Next, I changed just a little my JSON file: I've put a special character >> (â): >> {"docs":[{"_id":"0","label ":"Pas de tâches"}]} >> >> The result of inserting this JSON file on the database was: >> {"error":"bad_request","reason":"invalid_json"} >> >> >> >> Anyone can help me with this issue? >> >> Thank you >> >> Best regards.
A quick suggestion, download an editor that explicitly supports encodings like textpad or komodo, & create your JSON file in that, and save as UTF8. You'll find that works just fine. Sample files in https://www.dropbox.com/sh/jeifcxpbtpo78ak/--8BGo8bb3/tmp/utf8wtf.zip one created on mac & transferred, the other created in windows. C:\tmp>curl -HContent-Type:application/json http://localhost:5984/testy/utf8mac -XPUT [email protected] {"ok":true,"id":"utf8mac","rev":"1-b46df9f1f811323a133af7faf36d1a89"} C:\tmp>curl -HContent-Type:application/json http://localhost:5984/testy/utf8windows -XPUT [email protected] {"ok":true,"id":"utf8windows","rev":"1-b46df9f1f811323a133af7faf36d1a89"} Without having tested it, something like recode latin1..UTF-8 *.json would probably do the trick, I assume http://unxutils.sourceforge.net/ version is suitable. A+ Dave
