What version of Hadoop are you using?

Have you considered hooking up a profiler to the Datanode on GCE to see where the time is being spent? That might help shed some light on the situation.

Ara Ebrahimi wrote:
Hi,

We’re seeing some weird behavior from the hdfs daemon on google cloud 
environment when we use accumulo Scanner to sequentially scan a table. Top 
reports 200-300% cpu usage for the hdfs daemon. Accumulo is also around 500%. 
iostat %util is low. avgrq-sz is low, rMB/s is low, there’s lots of free 
memory. It seems like something causes the hdfs daemon to consume a lot of cpu 
and not to send enough read requests to the disk (ssd actually, so disk is 
super fast and vastly under-utilized). The process which sends scan requests to 
accumulo is 500% active (using 3 query batch threads and aggressive 
scan-batch-size/read-ahead-threashold values). So it seems like somehow hdfs is 
the bottleneck. On another cluster we rarely see hdfs daemon going over 10% cpu 
usage. Any idea what the issue could be?

Thanks,
Ara.



________________________________

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Thank you in advance for your 
cooperation.

________________________________

Reply via email to