Multi-threaded HDFS I/O

Matthias Boehm Sun, 26 Feb 2023 04:44:38 -0800

Hi all,

we recently setup a new cluster and in this process upgraded from Hadoop3.3.1 to 3.3.4. However, we face an issue with multi-threaded read andwrite of independent files that seems to get serialized under the coversof the HDFS implementations despite each node having 8+2 disks.

While debugging this issue, we also tested writing local files throughthe HDFS API which works perfectly fine, top shows very good coreutilization, and is much faster than writes to HDFS. In contrast,multi-threaded reads of multiple files from HDFS show only a 100%(single core) utilization, and writes are even below that. Do you haveany pointers on configurations knobs (maybe related to [1]) to fix thator is this a known bug? I cannot imagine that this is intended behaviorbecause it would similarly mean that all I/O requests from threads of aSpark executor would get serialized.

[1]https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FairCallQueue.html


Regards,
Matthias

PS: sorry for the previous corrupted email.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Multi-threaded HDFS I/O

Reply via email to