I have a hadoop cluster and I need to query the data stored on the HDFS using spark sql thrift server.
Spark sql thrift server is up and running. It is configured to read from HIVE table. The hive table is an external table that corresponding to set of files stored on HDFS. These files contains JSON data. I am connecting to spark sql thrift server using beeline. When I try to execute a simple query like *select * from mytable limit 3* every thing works fine. But when I try to execute other queries like *select count(*) from mytable* the following exceptions is thrown *org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unrecognized character escape ' ' (code 32) at [Source: java.io.StringReader@34ef429a; line: 1, column: 351]* What I understand from the exception is that there are some files contains corrupted JSON. question 1 : am I understand this correctly? question 2 : How can I find the file(s) causes this problem if I have about 3 thousand files and each file contains about 700 line of json data ? question 3 : If I am sure that the json in the files on HDFS contains valid json data, what should I do ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-hadoop-hive-serde2-SerDeException-org-codehaus-jackson-JsonParseException-tp22103.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org