Hi,
We can use third party built in classes from NLP, Text Mining libraries,
and others in java Map Reduce or We can use Python plus Hadoop streaming
for writing more parallel complex code.
This link has code for computing Pearson correlation:
Hai,
Check the individual data nodes usage:
Hadoop dfsadmin -report
And moreover override the config parameter mapred.local.dir to store
intermediate data in some path rather than /tmp directory and don't use
single reducer, increase no of reducers and use totalorderpartitioner
Thanks
Hi,
Addition to Sandy, there is one more scheduler called HOD (Hadoop on
Demand). Please go through the following links to get more details on
schedulers.
HOD - http://hadoop.apache.org/docs/r1.1.2/hod_scheduler.html
Fair - http://hadoop.apache.org/docs/r1.1.2/fair_scheduler.html
Capacity -