Exactly, the fields between "!!" is a (key, value) customized data structure. 

So, newAPIHadoopFile may be the best practice now. For this specific format,
change the delimiter from default "\n" to "!!\n" can be the cheapest, and
this can only be done in hadoop2.x, in hadoop1.x, this can be done by
Implementing a InputFormat although most codes are the same with
TextInputFormat apart from the delimiter. 

This is my first time talking in this mail list and I find you guys are
really nice! Thanks for your discussion with me!



-----
Senior in Tsinghua Univ.
github: http://www.github.com/uronce-cc
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Confusing-behavior-of-newAPIHadoopFile-tp10764p10779.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to