hi,all I have a test with spark 2.0: I have a test file: field delimiter with \t kevin 30 2016 shen 30 2016 kai 33 2016 wei 30 2016 after useing: var datas: RDD[(LongWritable, String)] = sc.newAPIHadoopFile(inputPath+filename, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], hadoopConf).map { case (key, value) => (key, new String(value.getBytes, decode)) } and I save RDD to hdfs I got this: (0,kevin 30 2016) (14,shen 30 20166) (27,kai 33 201666) (39,wei 30 201666) It looks like after the reader read a line and it did't clean it's buffer or something?
2017-01-17 lk_spark