Hi,
I'm loading a bunch of json files and there seems to be problems with
specific files (either schema changes or incomplete files).
I'd like to catch the inconsistent files but I'm not sure how to do it.

This is the exception I get:
14/11/20 00:13:49 INFO cluster.YarnClientClusterScheduler: Removed TaskSet
0.0, whose tasks have all completed, from pool
org.apache.spark.SparkException: Job aborted due to stage failure: Task
3027 in stage 0.0 failed 4 times, most recent failure: Lost task 3027.3 in
stage 0.0 (TID 3100, HDdata2):
com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input: was
expecting closing quote for a string value
 at [Source: java.io.StringReader@39a8eab6; line: 1, column: 1805]

com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1524)

and this is the code causing it:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._

val jsonFiles=sqlContext.jsonFile("/requests.loading")


How can I do it ?

Thanks,
Daniel

Reply via email to