On 18 Oct 2016, at 08:43, Chetan Khatri 
<ckhatriman...@gmail.com<mailto:ckhatriman...@gmail.com>> wrote:

Hello Community members,

I am getting error while reading large JSON file in spark,

the underlying read code can't handle more than 2^31 bytes in a single line:

    if (bytesConsumed > Integer.MAX_VALUE) {
      throw new IOException("Too many bytes before newline: " + bytesConsumed);

That's because it's trying to split work by line, and of course, there aren't 

you need to move over to reading the JSON by other means, i'm afraid. At a 
guess, something involving SparkContext.binaryFiles() streaming the data 
straight into a JSON parser,


val landingVisitor = 

unrelated, but use s3a if you can. It's better, you know.


16/10/18 07:30:30 ERROR Executor: Exception in task 8.0 in stage 0.0 (TID 8)
java.io.IOException: Too many bytes before newline: 2147483648
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:249)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:135)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)

What would be resolution for the same ?

Thanks in Advance !

Yours Aye,
Chetan Khatri.

Reply via email to