I am wondering how Hadoop assign groups when dirs/files are being created by a user and below are some tests I have done. In my cluster, group hadoop is configured as the supergroup.
> hadoop fs -ls /tmp drwxrwxrwx - abc hadoop 0 2012-08-10 23:02 /tmp/abc drwxrwxrwx - def other_group 0 2012-08-10 23:02 /tmp/def > groups apache apache: apache wheel > sudo -u apache hadoop fs -put somefile /tmp/abc > hadoop fs -ls /tmp/abc -rw-rw-r-- 3 apache hadoop 120962 2012-08-13 16:03 /tmp/abc/somefile > sudo -u apache hadoop fs -put somefile /tmp/def > hadoop fs -lsr /tmp/def -rw-rw-r-- 3 apache other_group 120962 2012-08-13 16:03 /tmp/abc/somefile *Based on the experiments above, it looks like the file got pushed on hdfs is always inheriting its group from the parent including folder. Is that always the case?* A follow-up question on one finding in Hive is: when executing a query to overwrite a table (or a partition within a table), the newly written overriding directory always end up as belong to HDFS's supergroup, no matter what context it is running from 1. The user who is executing the hive query 2. The group where the user belongs to 3. The group the parent table directory is belonging to. *Is it always expected in Hive?* For example, table A is stored on /path/A and is partitioned on column dh. /path/A is with group *other_group*. After running *insert overwrite A partition (dh = "12") select column list from ... where ...* /path/A/12 ends up with *hadoop* as always. This has contradicts to the assumption of inheritance I have drawn out above. Any thoughts would be appreciated. Thanks Chen
