Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/#review23858 --- Ship it! Ship It! - Ashutosh Chauhan On July 19, 2013, 6:55 p.m., Chaoyu Tang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/ --- (Updated July 19, 2013, 6:55 p.m.) Review request for hive, Ashutosh Chauhan and Sushanth Sowmyan. Bugs: HIVE- and HIVE-3756 https://issues.apache.org/jira/browse/HIVE- https://issues.apache.org/jira/browse/HIVE-3756 Repository: hive-git Description --- Problems: 1. When doing load data or insert overwrite to a table, the data files under database/table directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756. 2. Beside the group issue, the read/write permission mode is also not inherited 3. Same problem affects the partition files (see HIVE-3094) Cause: The task results (from load data or insert overwrite) are initially stored in scratchdir and then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including group and mode) which are determined by scratchdir permission or umask. If the scratchdir has different permissions from those of warehouse table directories, the problem occurs. Solution: After the FileSystem.rename is called, changing all renamed (moved) files/dirs to their destination parents' permissions if needed (say if hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method renameFile doing both rename and permission. It replaces the FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename used to move files/dirs under same scratchdir in the middle of task processing. It looks to me not necessary since they are temp files and also probably access protected by top scratchdir mode 700 (HIVE-4487). Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 87a584d Diff: https://reviews.apache.org/r/12050/diff/ Testing --- The following cases tested that all created subdirs/files inherit their parents' permission mode and group in : 1). create database; 2). create table; 3). load data; 4) insert overwrite; 5) partitions. {code} hive dfs -ls -d /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:20 /user/tester1/hive hive create database tester1 COMMENT 'Database for user tester1' LOCATION '/user/tester1/hive/tester1.db'; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:21 /user/tester1/hive/tester1.db hive use tester1; hive create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db/tst1 hive load data local inpath '/home/tester1/tst1.input' into table tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input hive create table tester1.tst2(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 /user/tester1/hive/tester1.db/tst2 hive insert overwrite table tst2 select * from tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 /user/tester1/hive/tester1.db
Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/ --- (Updated July 19, 2013, 6:54 p.m.) Review request for hive. Changes --- Further change from previous version is to preserve the permission/group combination for the insert overwrite case etc. The added change is to get the permission/group information of a dir/file before it gets deleted in the replace, then apply them back to replaced dir/file. Further tests: 1. create 2 tables under db tester1 with the permission/group as following, the tables' dir inherit their parent db tester1.db permission/group combination: drwxrwx--- - ctang staff170 2013-07-19 14:22 /user/tester1/hive/tester1.db drwxrwx--- - ctang staff 68 2013-07-19 14:21 /user/tester1/hive/tester1.db/tst1 drwxrwx--- - ctang staff 68 2013-07-19 14:21 /user/tester1/hive/tester1.db/tst2 2. change tst2: dfs -chmod 700 /user/tester1/hive/tester1.db/tst2; dfs -ls -R /user/tester1/hive; drwxrwx--- - ctang staff170 2013-07-19 14:22 /user/tester1/hive/tester1.db drwxrwx--- - ctang staff136 2013-07-19 14:24 /user/tester1/hive/tester1.db/tst1 -rwxrwx--- 1 ctang staff168 2013-07-19 14:24 /user/tester1/hive/tester1.db/tst1/tst1.input drwx-- - ctang staff 68 2013-07-19 14:21 /user/tester1/hive/tester1.db/tst2 3. insert overwrite table tst2 select * from tst1; 4. dfs -ls -R /user/tester1/hive; drwxrwx--- - ctang staff170 2013-07-19 14:25 /user/tester1/hive/tester1.db drwxrwx--- - ctang staff136 2013-07-19 14:24 /user/tester1/hive/tester1.db/tst1 -rwxrwx--- 1 ctang staff168 2013-07-19 14:24 /user/tester1/hive/tester1.db/tst1/tst1.input drwx-- - ctang staff136 2013-07-19 14:25 /user/tester1/hive/tester1.db/tst2 -rwx-- 1 ctang staff291 2013-07-19 14:25 /user/tester1/hive/tester1.db/tst2/00_0 == The permissions of /user/tester1/hive/tester1.db/tst2 and /user/tester1/hive/tester1.db/tst2/00_0 are same as those changed in step 2, instead from /user/tester1/hive/tester1.db Bugs: HIVE- and HIVE-3756 https://issues.apache.org/jira/browse/HIVE- https://issues.apache.org/jira/browse/HIVE-3756 Repository: hive-git Description --- Problems: 1. When doing load data or insert overwrite to a table, the data files under database/table directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756. 2. Beside the group issue, the read/write permission mode is also not inherited 3. Same problem affects the partition files (see HIVE-3094) Cause: The task results (from load data or insert overwrite) are initially stored in scratchdir and then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including group and mode) which are determined by scratchdir permission or umask. If the scratchdir has different permissions from those of warehouse table directories, the problem occurs. Solution: After the FileSystem.rename is called, changing all renamed (moved) files/dirs to their destination parents' permissions if needed (say if hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method renameFile doing both rename and permission. It replaces the FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename used to move files/dirs under same scratchdir in the middle of task processing. It looks to me not necessary since they are temp files and also probably access protected by top scratchdir mode 700 (HIVE-4487). Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 87a584d Diff: https://reviews.apache.org/r/12050/diff/ Testing --- The following cases tested that all created subdirs/files inherit their parents' permission mode and group in : 1). create database; 2). create table; 3). load data; 4) insert overwrite; 5) partitions. {code} hive dfs -ls -d /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:20 /user/tester1/hive hive create database tester1 COMMENT 'Database for user tester1' LOCATION '/user/tester1/hive/tester1.db'; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:21 /user/tester1/hive/tester1.db hive use tester1; hive create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22
Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/ --- (Updated July 19, 2013, 6:55 p.m.) Review request for hive, Ashutosh Chauhan and Sushanth Sowmyan. Bugs: HIVE- and HIVE-3756 https://issues.apache.org/jira/browse/HIVE- https://issues.apache.org/jira/browse/HIVE-3756 Repository: hive-git Description --- Problems: 1. When doing load data or insert overwrite to a table, the data files under database/table directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756. 2. Beside the group issue, the read/write permission mode is also not inherited 3. Same problem affects the partition files (see HIVE-3094) Cause: The task results (from load data or insert overwrite) are initially stored in scratchdir and then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including group and mode) which are determined by scratchdir permission or umask. If the scratchdir has different permissions from those of warehouse table directories, the problem occurs. Solution: After the FileSystem.rename is called, changing all renamed (moved) files/dirs to their destination parents' permissions if needed (say if hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method renameFile doing both rename and permission. It replaces the FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename used to move files/dirs under same scratchdir in the middle of task processing. It looks to me not necessary since they are temp files and also probably access protected by top scratchdir mode 700 (HIVE-4487). Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 87a584d Diff: https://reviews.apache.org/r/12050/diff/ Testing --- The following cases tested that all created subdirs/files inherit their parents' permission mode and group in : 1). create database; 2). create table; 3). load data; 4) insert overwrite; 5) partitions. {code} hive dfs -ls -d /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:20 /user/tester1/hive hive create database tester1 COMMENT 'Database for user tester1' LOCATION '/user/tester1/hive/tester1.db'; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:21 /user/tester1/hive/tester1.db hive use tester1; hive create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db/tst1 hive load data local inpath '/home/tester1/tst1.input' into table tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input hive create table tester1.tst2(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 /user/tester1/hive/tester1.db/tst2 hive insert overwrite table tst2 select * from tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2 -rw-rw 3 tester1 testgroup123291 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2/00_0 hive create table
Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
On July 3, 2013, 1:46 a.m., Ashutosh Chauhan wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2128 https://reviews.apache.org/r/12050/diff/2/?file=314926#file314926line2128 I had quite a discussion on this with Rohini on HIVE-2936 on how to do these fs ops in a performant and robust way. Feel free to follow that. This piece of code is rewritten to: {code} try { //if destf is an existing directory, rename just move the src file/dir under destf and //destf itself will be the parent dir after the rename if (fs.getFileStatus(destf).isDir()) { destfp = destf; LOG.debug(Renaming: + destf.toString() + is an existing directory; + srcf.toString() + will move under it.); } } catch (FileNotFoundException ignore) { } {code} So it will be optimized to save 1 call if the destf exists On July 3, 2013, 1:46 a.m., Ashutosh Chauhan wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2151 https://reviews.apache.org/r/12050/diff/2/?file=314926#file314926line2151 Why not use api instead of FsShell? Does filesystem doesn't offer an api for recursively doing chmod and chgrp? FileSystem offers the APIs for changing permission and owner (group), but not recursively. An alternate implementation I thought of initially is like following: {code} try { FileStatus destParentfstatus = fs.getFileStatus(destfp); String destfpGroup = destParentfstatus.getGroup(); FsPermission destfpPermission = destParentfstatus.getPermission(); FileStatus[] fstatusList = fs.listStatus(destf); for (FileStatus fstatus : fstatusList) { fs.setOwner(fstatus.getPath(), null, destfpGroup); fs.setPermission(fstatus.getPath(), destfpPermission); } } catch (FileNotFoundException fnfe) { //it should not be thrown } catch (IOException e) { throw new HiveException(Unable to set permissions of + destf + to those of its parent + destf.getParent(), e); } {code} I chose to use the FsShell since the code with commands (-chgrp, -chmod) looked to me more intuitive. As for which way is more performant, I am not sure but I was also not able to tell any gain in the alternate approach. Any suggestion? On July 3, 2013, 1:46 a.m., Ashutosh Chauhan wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2265 https://reviews.apache.org/r/12050/diff/2/?file=314926#file314926line2265 I think following way to write this is more robust: if(inheritPerms) { fs.mkdirs(destfp, destfp.getParent().getPerms); } else { fs.mkdirs(destfp); } It was following the Rohini's comments on HIVE-2936: {code} It is not safe to use fs.mkdirs(path, permission) because the dfs umask is applied on that permission in DFSClient which is not desired. We have been bitten by wrong permission issues because of using that API. It is always safer to do mkdirs() and then do a setPermission() if you are dealing with HDFS. {code} So I did: {code} boolean success = fs.mkdirs(destfp); if (inheritPerms success) { fs.setPermission(destfp, fs.getFileStatus(destfp.getParent()).getPermission()); } {code} Not sure, since these APIs documentation is quite limited. - Chaoyu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/#review22698 --- On July 2, 2013, 4:39 p.m., Chaoyu Tang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/ --- (Updated July 2, 2013, 4:39 p.m.) Review request for hive. Bugs: HIVE- and HIVE-3756 https://issues.apache.org/jira/browse/HIVE- https://issues.apache.org/jira/browse/HIVE-3756 Repository: hive-git Description --- Problems: 1. When doing load data or insert overwrite to a table, the data files under database/table directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756. 2. Beside the group issue, the read/write permission mode is also not inherited 3. Same problem affects the partition files (see HIVE-3094) Cause: The task results (from load data or insert overwrite) are initially stored in scratchdir and then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including group and mode) which are determined by scratchdir permission or umask. If the scratchdir has different permissions from those of warehouse table directories, the problem occurs.
Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
On July 1, 2013, 6:26 p.m., Sushanth Sowmyan wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2128 https://reviews.apache.org/r/12050/diff/1/?file=310452#file310452line2128 I understand from javadoc that FileStatus.isDirectory() is supposed to be the new way of calling and .isDir() is deprecated, but I wanted to point out that hive trunk (with no hadoop revision args) does not even compile for me with isDirectory() here. Also, given that .isDir() is used all over the place in hive, for the sake of consistency, would this break functionality in any significant way if you use it here? Sushanth Sowmyan wrote: To add more details, this patch only works if you are compiling against Hadoop 2.x, and does not if you are compiling against 0.20.x or 1.x. If you need to use isDirectory() for some functionality reason, you should add the appropriate shims for it. Without that, this should be written as isDir(). A good point. Changed to .isDir. Thanks - Chaoyu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/#review22617 --- On June 22, 2013, 8:43 p.m., Chaoyu Tang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/ --- (Updated June 22, 2013, 8:43 p.m.) Review request for hive. Bugs: HIVE- and HIVE-3756 https://issues.apache.org/jira/browse/HIVE- https://issues.apache.org/jira/browse/HIVE-3756 Repository: hive-git Description --- Problems: 1. When doing load data or insert overwrite to a table, the data files under database/table directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756. 2. Beside the group issue, the read/write permission mode is also not inherited 3. Same problem affects the partition files (see HIVE-3094) Cause: The task results (from load data or insert overwrite) are initially stored in scratchdir and then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including group and mode) which are determined by scratchdir permission or umask. If the scratchdir has different permissions from those of warehouse table directories, the problem occurs. Solution: After the FileSystem.rename is called, changing all renamed (moved) files/dirs to their destination parents' permissions if needed (say if hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method renameFile doing both rename and permission. It replaces the FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename used to move files/dirs under same scratchdir in the middle of task processing. It looks to me not necessary since they are temp files and also probably access protected by top scratchdir mode 700 (HIVE-4487). Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 Diff: https://reviews.apache.org/r/12050/diff/ Testing --- The following cases tested that all created subdirs/files inherit their parents' permission mode and group in : 1). create database; 2). create table; 3). load data; 4) insert overwrite; 5) partitions. {code} hive dfs -ls -d /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:20 /user/tester1/hive hive create database tester1 COMMENT 'Database for user tester1' LOCATION '/user/tester1/hive/tester1.db'; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:21 /user/tester1/hive/tester1.db hive use tester1; hive create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db/tst1 hive load data local inpath '/home/tester1/tst1.input' into table tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1
Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/ --- (Updated July 2, 2013, 4:39 p.m.) Review request for hive. Changes --- update the change (use FileStatus.isDir instead of isDirectory) Bugs: HIVE- and HIVE-3756 https://issues.apache.org/jira/browse/HIVE- https://issues.apache.org/jira/browse/HIVE-3756 Repository: hive-git Description --- Problems: 1. When doing load data or insert overwrite to a table, the data files under database/table directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756. 2. Beside the group issue, the read/write permission mode is also not inherited 3. Same problem affects the partition files (see HIVE-3094) Cause: The task results (from load data or insert overwrite) are initially stored in scratchdir and then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including group and mode) which are determined by scratchdir permission or umask. If the scratchdir has different permissions from those of warehouse table directories, the problem occurs. Solution: After the FileSystem.rename is called, changing all renamed (moved) files/dirs to their destination parents' permissions if needed (say if hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method renameFile doing both rename and permission. It replaces the FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename used to move files/dirs under same scratchdir in the middle of task processing. It looks to me not necessary since they are temp files and also probably access protected by top scratchdir mode 700 (HIVE-4487). Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 Diff: https://reviews.apache.org/r/12050/diff/ Testing --- The following cases tested that all created subdirs/files inherit their parents' permission mode and group in : 1). create database; 2). create table; 3). load data; 4) insert overwrite; 5) partitions. {code} hive dfs -ls -d /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:20 /user/tester1/hive hive create database tester1 COMMENT 'Database for user tester1' LOCATION '/user/tester1/hive/tester1.db'; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:21 /user/tester1/hive/tester1.db hive use tester1; hive create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db/tst1 hive load data local inpath '/home/tester1/tst1.input' into table tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input hive create table tester1.tst2(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 /user/tester1/hive/tester1.db/tst2 hive insert overwrite table tst2 select * from tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:25 /user/tester1/hive/tester1.db/tst2 -rw-rw 3 tester1 testgroup123291 2013-06-22 13:25
Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/#review22698 --- ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java https://reviews.apache.org/r/12050/#comment46400 I had quite a discussion on this with Rohini on HIVE-2936 on how to do these fs ops in a performant and robust way. Feel free to follow that. ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java https://reviews.apache.org/r/12050/#comment46401 Why not use api instead of FsShell? Does filesystem doesn't offer an api for recursively doing chmod and chgrp? ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java https://reviews.apache.org/r/12050/#comment46398 I think following way to write this is more robust: if(inheritPerms) { fs.mkdirs(destfp, destfp.getParent().getPerms); } else { fs.mkdirs(destfp); } ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java https://reviews.apache.org/r/12050/#comment46399 same as previous comment. Couple of comments on api usage. - Ashutosh Chauhan On July 2, 2013, 4:39 p.m., Chaoyu Tang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/ --- (Updated July 2, 2013, 4:39 p.m.) Review request for hive. Bugs: HIVE- and HIVE-3756 https://issues.apache.org/jira/browse/HIVE- https://issues.apache.org/jira/browse/HIVE-3756 Repository: hive-git Description --- Problems: 1. When doing load data or insert overwrite to a table, the data files under database/table directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756. 2. Beside the group issue, the read/write permission mode is also not inherited 3. Same problem affects the partition files (see HIVE-3094) Cause: The task results (from load data or insert overwrite) are initially stored in scratchdir and then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including group and mode) which are determined by scratchdir permission or umask. If the scratchdir has different permissions from those of warehouse table directories, the problem occurs. Solution: After the FileSystem.rename is called, changing all renamed (moved) files/dirs to their destination parents' permissions if needed (say if hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method renameFile doing both rename and permission. It replaces the FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename used to move files/dirs under same scratchdir in the middle of task processing. It looks to me not necessary since they are temp files and also probably access protected by top scratchdir mode 700 (HIVE-4487). Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 Diff: https://reviews.apache.org/r/12050/diff/ Testing --- The following cases tested that all created subdirs/files inherit their parents' permission mode and group in : 1). create database; 2). create table; 3). load data; 4) insert overwrite; 5) partitions. {code} hive dfs -ls -d /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:20 /user/tester1/hive hive create database tester1 COMMENT 'Database for user tester1' LOCATION '/user/tester1/hive/tester1.db'; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:21 /user/tester1/hive/tester1.db hive use tester1; hive create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db/tst1 hive load data local inpath '/home/tester1/tst1.input' into table tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23
Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/#review22617 --- ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java https://reviews.apache.org/r/12050/#comment46309 I understand from javadoc that FileStatus.isDirectory() is supposed to be the new way of calling and .isDir() is deprecated, but I wanted to point out that hive trunk (with no hadoop revision args) does not even compile for me with isDirectory() here. Also, given that .isDir() is used all over the place in hive, for the sake of consistency, would this break functionality in any significant way if you use it here? - Sushanth Sowmyan On June 22, 2013, 8:43 p.m., Chaoyu Tang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/ --- (Updated June 22, 2013, 8:43 p.m.) Review request for hive. Bugs: HIVE- and HIVE-3756 https://issues.apache.org/jira/browse/HIVE- https://issues.apache.org/jira/browse/HIVE-3756 Repository: hive-git Description --- Problems: 1. When doing load data or insert overwrite to a table, the data files under database/table directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756. 2. Beside the group issue, the read/write permission mode is also not inherited 3. Same problem affects the partition files (see HIVE-3094) Cause: The task results (from load data or insert overwrite) are initially stored in scratchdir and then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including group and mode) which are determined by scratchdir permission or umask. If the scratchdir has different permissions from those of warehouse table directories, the problem occurs. Solution: After the FileSystem.rename is called, changing all renamed (moved) files/dirs to their destination parents' permissions if needed (say if hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method renameFile doing both rename and permission. It replaces the FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename used to move files/dirs under same scratchdir in the middle of task processing. It looks to me not necessary since they are temp files and also probably access protected by top scratchdir mode 700 (HIVE-4487). Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 Diff: https://reviews.apache.org/r/12050/diff/ Testing --- The following cases tested that all created subdirs/files inherit their parents' permission mode and group in : 1). create database; 2). create table; 3). load data; 4) insert overwrite; 5) partitions. {code} hive dfs -ls -d /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:20 /user/tester1/hive hive create database tester1 COMMENT 'Database for user tester1' LOCATION '/user/tester1/hive/tester1.db'; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:21 /user/tester1/hive/tester1.db hive use tester1; hive create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db/tst1 hive load data local inpath '/home/tester1/tst1.input' into table tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1/tst1.input hive create table tester1.tst2(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:24 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123
Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
On July 1, 2013, 6:26 p.m., Sushanth Sowmyan wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2128 https://reviews.apache.org/r/12050/diff/1/?file=310452#file310452line2128 I understand from javadoc that FileStatus.isDirectory() is supposed to be the new way of calling and .isDir() is deprecated, but I wanted to point out that hive trunk (with no hadoop revision args) does not even compile for me with isDirectory() here. Also, given that .isDir() is used all over the place in hive, for the sake of consistency, would this break functionality in any significant way if you use it here? To add more details, this patch only works if you are compiling against Hadoop 2.x, and does not if you are compiling against 0.20.x or 1.x. If you need to use isDirectory() for some functionality reason, you should add the appropriate shims for it. Without that, this should be written as isDir(). - Sushanth --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/#review22617 --- On June 22, 2013, 8:43 p.m., Chaoyu Tang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12050/ --- (Updated June 22, 2013, 8:43 p.m.) Review request for hive. Bugs: HIVE- and HIVE-3756 https://issues.apache.org/jira/browse/HIVE- https://issues.apache.org/jira/browse/HIVE-3756 Repository: hive-git Description --- Problems: 1. When doing load data or insert overwrite to a table, the data files under database/table directory could not inherit their parent's permissions (i.e. group) as described in HIVE-3756. 2. Beside the group issue, the read/write permission mode is also not inherited 3. Same problem affects the partition files (see HIVE-3094) Cause: The task results (from load data or insert overwrite) are initially stored in scratchdir and then loaded under warehouse table directory. FileSystem.rename is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but it preserves their permissions (including group and mode) which are determined by scratchdir permission or umask. If the scratchdir has different permissions from those of warehouse table directories, the problem occurs. Solution: After the FileSystem.rename is called, changing all renamed (moved) files/dirs to their destination parents' permissions if needed (say if hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method renameFile doing both rename and permission. It replaces the FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename used to move files/dirs under same scratchdir in the middle of task processing. It looks to me not necessary since they are temp files and also probably access protected by top scratchdir mode 700 (HIVE-4487). Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 Diff: https://reviews.apache.org/r/12050/diff/ Testing --- The following cases tested that all created subdirs/files inherit their parents' permission mode and group in : 1). create database; 2). create table; 3). load data; 4) insert overwrite; 5) partitions. {code} hive dfs -ls -d /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:20 /user/tester1/hive hive create database tester1 COMMENT 'Database for user tester1' LOCATION '/user/tester1/hive/tester1.db'; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:21 /user/tester1/hive/tester1.db hive use tester1; hive create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db/tst1 hive load data local inpath '/home/tester1/tst1.input' into table tst1; hive dfs -ls -R /user/tester1/hive; drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:22 /user/tester1/hive/tester1.db drwxrwx--- - tester1 testgroup123 0 2013-06-22 13:23 /user/tester1/hive/tester1.db/tst1 -rw-rw 3 tester1 testgroup123168 2013-06-22 13:23