Hi,
We can use third party built in classes from NLP, Text Mining libraries,
and others in java Map Reduce or We can use Python plus Hadoop streaming
for writing more parallel complex code.
This link has code for computing Pearson correlation:
https://github.com/malli3131/HadoopTutorial/tree/mast
So far, I only know that Hadoop can do counting. I am wondering if there's
any way to make calls to an external program for more complex processing
than counting in hadoop. Is there any example? thanks
tony