Thanks for the info Nauroth, will try the distch.
Sorry for the late response.

For a chmod -R call on one directory, I see that there are many calls to
the namenode, I assume the recursion is done by the client.

Isn't it better that the recursion is done by the name and having a
re-entrant lock, instead of having a recursion over the network and taking
the lock for every call?





On Thu, Jun 16, 2016 at 11:24 AM, Chris Nauroth <[email protected]>
wrote:

> Hello Ravi,
>
> You might consider using DistCh.  In the same way that DistCp is a
> distributed copy implemented as a MapReduce job, DistCh is a MapReduce job
> that distributes the work of chmod/chown.
>
> DistCh will become easier to access through convenient shell commands in
> Apache Hadoop 3.  In version 2.6.0, it's undocumented and hard to find, but
> it's still there.  It's inside the hadoop-extras.jar.  Here is an example
> invocation:
>
> hadoop jar share/hadoop/tools/lib/hadoop-extras-*.jar
> org.apache.hadoop.tools.DistCh
>
> It might take some fiddling with the classpath to get this right.  If so,
> then I recommend looking at how the shell scripts in trunk set up the
> classpath.
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-extras/src/main/shellprofile.d/hadoop-extras.sh
>
> As you pointed out, this would generate higher NameNode traffic compared
> to your typical baseline load.  To mitigate this, I recommend that you
> start with a test run in a non-production environment to see how it reacts.
>
> --Chris Nauroth
>
> From: ravi teja <[email protected]>
> Date: Wednesday, June 15, 2016 at 8:33 PM
> To: "[email protected]" <[email protected]>
> Subject: Bulk chmod,chown operations on HDFS
>
> Hi Community,
>
> As part of the new authorisation changes, we need to change the
> permissions and owners of many files in hdfs (2.6.0) with chmod and chown.
>
> To do this we need to stop the processing on the directories to avoid
> inconsistencies in permissions, hence we need to take a downtime for those
> specific pipelines operating on these folders.
>
>
> The total number of files/directories to be operated upon is around 10
> Million.
> A chmod recursive (chmod -R) on 160K objects, has taken around 15 minutes.
>
> At this rate it will take a long time to complete the operation and the
> downtime would be couple of hours.
>
> Mapreduce program  is one option, but chmod,chown being a heavy
> operations, will slow down the cluster for other users, if done at this
> scale.
>
> Are there any options to do a bulk permissions changes chmod,chown to
> avoid these issues?
> If not are there any alternative approaches to carry the same operation at
> this scale something like admin backdoor to fsimage?
>
>
>
> Thanks,
> Ravi Teja
>

Reply via email to