group assignment on HDFS from Hadoop and Hive

Chen Song Mon, 13 Aug 2012 09:21:29 -0700

I am wondering how Hadoop assign groups when dirs/files are being created
by a user and below are some tests I have done. In my cluster, group hadoop
is configured as the supergroup.


> hadoop fs -ls /tmp
drwxrwxrwx   - abc hadoop          0 2012-08-10 23:02 /tmp/abc
drwxrwxrwx   - def other_group          0 2012-08-10 23:02 /tmp/def

> groups apache
apache: apache wheel

> sudo -u apache hadoop fs -put somefile /tmp/abc
> hadoop fs -ls /tmp/abc
-rw-rw-r--   3 apache hadoop     120962 2012-08-13 16:03 /tmp/abc/somefile

> sudo -u apache hadoop fs -put somefile /tmp/def
> hadoop fs -lsr /tmp/def
-rw-rw-r--   3 apache other_group     120962 2012-08-13
16:03 /tmp/abc/somefile

*Based on the experiments above, it looks like the file got pushed on hdfs
is always inheriting its group from the parent including folder. Is that
always the case?*

A follow-up question on one finding in Hive is: when executing a query to
overwrite a table (or a partition within a table), the newly written
overriding directory always end up as belong to HDFS's supergroup, no
matter what context it is running from
1. The user who is executing the hive query
2. The group where the user belongs to
3. The group the parent table directory is belonging to.
*Is it always expected in Hive?*

For example, table A is stored on /path/A and is partitioned on column
dh. /path/A is with group *other_group*.
After running *insert overwrite A partition (dh = "12") select column list
from ... where ...*

/path/A/12 ends up with *hadoop* as always. This has contradicts to the
assumption of inheritance I have drawn out above. Any thoughts would be
appreciated.

Thanks
Chen

group assignment on HDFS from Hadoop and Hive

Reply via email to