Hi all,

we recently setup a new cluster and in this process upgraded from Hadoop 3.3.1 to 3.3.4. However, we face an issue with multi-threaded read and write of independent files that seems to get serialized under the covers of the HDFS implementations despite each node having 8+2 disks.

While debugging this issue, we also tested writing local files through the HDFS API which works perfectly fine, top shows very good core utilization, and is much faster than writes to HDFS. In contrast, multi-threaded reads of multiple files from HDFS show only a 100% (single core) utilization, and writes are even below that. Do you have any pointers on configurations knobs (maybe related to [1]) to fix that or is this a known bug? I cannot imagine that this is intended behavior because it would similarly mean that all I/O requests from threads of a Spark executor would get serialized.

[1] https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FairCallQueue.html

Regards,
Matthias

PS: sorry for the previous corrupted email.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Reply via email to