Using distributed cache i put a common file in the hdfs.It contains of frequent
files to remove.In the code i converted words in the table into a hashtable and
removed words from other documents if they occur.

The problem is it removes these words for smaller files.If the file size
increases then those words are not removed.

Any reason for what is the problem.

Reply via email to