Using distributed cache i put a common file in the hdfs.It contains of frequent files to remove.In the code i converted words in the table into a hashtable and removed words from other documents if they occur.
The problem is it removes these words for smaller files.If the file size increases then those words are not removed. Any reason for what is the problem.
