Say I have a text file on hdfs in "OPENFORWRITE, HEALTHY" status. some process 
is appending to it. 


It has 4 lines in it.


hadoop fs -cat /file | wc -l 
4


However when I do a wordcount on this file, only first line is visible to the 
mapreduce. Similar in hive when i do "select count(*) from filetable" = 1


If I do "hadoop cp /file /file2", then it works as expected.(file2 is closed, 
file is still open)


wordcount would see 5 lines in the input directory(1 from opened file, 4 from 
copied file), hive will return 5.


I am wondering if there is anything related to TextInputFormat?


I am using CDH 4.4.0


Thanks.


Xiao Li


Reply via email to