Exactly, the fields between "!!" is a (key, value) customized data structure.
So, newAPIHadoopFile may be the best practice now. For this specific format, change the delimiter from default "\n" to "!!\n" can be the cheapest, and this can only be done in hadoop2.x, in hadoop1.x, this can be done by Implementing a InputFormat although most codes are the same with TextInputFormat apart from the delimiter. This is my first time talking in this mail list and I find you guys are really nice! Thanks for your discussion with me! ----- Senior in Tsinghua Univ. github: http://www.github.com/uronce-cc -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Confusing-behavior-of-newAPIHadoopFile-tp10764p10779.html Sent from the Apache Spark User List mailing list archive at Nabble.com.