how to use newAPIHadoopFile

lk_spark Mon, 16 Jan 2017 17:43:35 -0800

hi,all
    I have a test with spark 2.0:
    
I have a test file: field delimiter with \t
kevin 30 2016
shen 30 2016
kai 33 2016
wei 30 2016
after useing:
var datas: RDD[(LongWritable, String)] = 
sc.newAPIHadoopFile(inputPath+filename, classOf[TextInputFormat], 
classOf[LongWritable], classOf[Text], hadoopConf).map { case (key, value) =>
(key, new String(value.getBytes, decode))
}
and I save RDD to hdfs I got this:
(0,kevin 30 2016)
(14,shen 30 20166)
(27,kai 33 201666)
(39,wei 30 201666)
It looks like after the reader read a line and it did't clean it's buffer or 
something?


2017-01-17


lk_spark

how to use newAPIHadoopFile

Reply via email to