Hi all,
I need to create a block map for all files in a specific directory (and
subdir) in HDFS.

I'm using fs.listFiles API then I loop in the
RemoteIterator[LocatedFileStatus] returned by listFiles and for each
LocatedFileStatus I use the getFileBlockLocations api to get all the block
ids of that file, but it takes long time because I have millions of file in
the HDFS directory.
I also tried to use Spark to parallelize the execution, but HDFS' API are
not serializable.

Is there a better way? I know there is the "hdfs oiv" command but I can't
access directly the Namenode directory, also the ImageFS file could be
outdated and I can't force the safemode to execute the saveNamespace
command.

I'm using Scala 2.11 with Hadoop 2.7.1 (HDP 2.6.3)

Thank you

Reply via email to