Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)

2013-07-25 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12050/#review23858
---

Ship it!


Ship It!

- Ashutosh Chauhan


On July 19, 2013, 6:55 p.m., Chaoyu Tang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/12050/
 ---
 
 (Updated July 19, 2013, 6:55 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Sushanth Sowmyan.
 
 
 Bugs: HIVE- and HIVE-3756
 https://issues.apache.org/jira/browse/HIVE-
 https://issues.apache.org/jira/browse/HIVE-3756
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Problems:
 1. When doing load data or insert overwrite to a table, the data files under 
 database/table directory could not inherit their parent's permissions (i.e. 
 group) as described in HIVE-3756.
 2. Beside the group issue, the read/write permission mode is also not 
 inherited
 3. Same problem affects the partition files (see HIVE-3094)
 
 Cause:
 The task results (from load data or insert overwrite) are initially stored in 
 scratchdir and then loaded under warehouse table directory. FileSystem.rename 
 is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files 
 but it preserves their permissions (including group and mode) which are 
 determined by scratchdir permission or umask. If the scratchdir has different 
 permissions from those of warehouse table directories, the problem occurs.
 
 Solution:
 After the FileSystem.rename is called, changing all renamed (moved) 
 files/dirs to their destination parents' permissions if needed (say if 
 hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method 
 renameFile doing both rename and permission. It replaces the 
 FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename 
 used to move files/dirs under same scratchdir in the middle of task 
 processing. It looks to me not necessary since they are temp files and also 
 probably access protected by top scratchdir mode 700 (HIVE-4487).
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 87a584d 
 
 Diff: https://reviews.apache.org/r/12050/diff/
 
 
 Testing
 ---
 
 The following cases tested that all created subdirs/files inherit their 
 parents' permission mode and group in : 1). create database; 2). create 
 table; 3). load data; 4) insert overwrite; 5) partitions.
 {code}
 hive dfs -ls -d /user/tester1/hive;  
  
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:20 
 /user/tester1/hive
 
 hive create database tester1 COMMENT 'Database for user tester1' LOCATION 
 '/user/tester1/hive/tester1.db';
 hive dfs -ls -R /user/tester1/hive;  
   
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:21 
 /user/tester1/hive/tester1.db
 
 hive use tester1;
 hive  create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
 hive dfs -ls -R /user/tester1/hive;  
   
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db/tst1
 
 hive  load data local inpath '/home/tester1/tst1.input' into table tst1; 
   
 hive dfs -ls -R /user/tester1/hive; 
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:23 
 /user/tester1/hive/tester1.db/tst1
 -rw-rw   3 tester1 testgroup123168 2013-06-22 13:23 
 /user/tester1/hive/tester1.db/tst1/tst1.input
 
 hive create table tester1.tst2(col1 int, col2 string) ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE;
 hive dfs -ls -R /user/tester1/hive;  
   
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:24 
 /user/tester1/hive/tester1.db
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:23 
 /user/tester1/hive/tester1.db/tst1
 -rw-rw   3 tester1 testgroup123168 2013-06-22 13:23 
 /user/tester1/hive/tester1.db/tst1/tst1.input
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:24 
 /user/tester1/hive/tester1.db/tst2
 
 hive insert overwrite table tst2 select * from tst1;
 hive dfs -ls -R /user/tester1/hive; 
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:25 
 /user/tester1/hive/tester1.db
 

Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)

2013-07-19 Thread Chaoyu Tang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12050/
---

(Updated July 19, 2013, 6:54 p.m.)


Review request for hive.


Changes
---

Further change from previous version is to preserve the permission/group 
combination for the insert overwrite case etc. The added change is to get the 
permission/group information of a dir/file before it gets deleted in the 
replace, then apply them back to replaced dir/file.

Further tests:
1. create 2 tables under db tester1 with the permission/group as following, the 
tables' dir inherit their parent db tester1.db permission/group combination:
drwxrwx---   - ctang staff170 2013-07-19 14:22 
/user/tester1/hive/tester1.db
drwxrwx---   - ctang staff 68 2013-07-19 14:21 
/user/tester1/hive/tester1.db/tst1
drwxrwx---   - ctang staff 68 2013-07-19 14:21 
/user/tester1/hive/tester1.db/tst2

2. change tst2: dfs -chmod 700 /user/tester1/hive/tester1.db/tst2;
dfs -ls -R /user/tester1/hive;
drwxrwx---   - ctang staff170 2013-07-19 14:22 
/user/tester1/hive/tester1.db
drwxrwx---   - ctang staff136 2013-07-19 14:24 
/user/tester1/hive/tester1.db/tst1
-rwxrwx---   1 ctang staff168 2013-07-19 14:24 
/user/tester1/hive/tester1.db/tst1/tst1.input
drwx--   - ctang staff 68 2013-07-19 14:21 
/user/tester1/hive/tester1.db/tst2

3. insert overwrite table tst2 select * from tst1;
4. dfs -ls -R /user/tester1/hive;
drwxrwx---   - ctang staff170 2013-07-19 14:25 
/user/tester1/hive/tester1.db
drwxrwx---   - ctang staff136 2013-07-19 14:24 
/user/tester1/hive/tester1.db/tst1
-rwxrwx---   1 ctang staff168 2013-07-19 14:24 
/user/tester1/hive/tester1.db/tst1/tst1.input
drwx--   - ctang staff136 2013-07-19 14:25 
/user/tester1/hive/tester1.db/tst2
-rwx--   1 ctang staff291 2013-07-19 14:25 
/user/tester1/hive/tester1.db/tst2/00_0
==
The permissions of /user/tester1/hive/tester1.db/tst2 and 
/user/tester1/hive/tester1.db/tst2/00_0 are same as those changed in step 
2, instead from /user/tester1/hive/tester1.db


Bugs: HIVE- and HIVE-3756
https://issues.apache.org/jira/browse/HIVE-
https://issues.apache.org/jira/browse/HIVE-3756


Repository: hive-git


Description
---

Problems:
1. When doing load data or insert overwrite to a table, the data files under 
database/table directory could not inherit their parent's permissions (i.e. 
group) as described in HIVE-3756.
2. Beside the group issue, the read/write permission mode is also not inherited
3. Same problem affects the partition files (see HIVE-3094)

Cause:
The task results (from load data or insert overwrite) are initially stored in 
scratchdir and then loaded under warehouse table directory. FileSystem.rename 
is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but 
it preserves their permissions (including group and mode) which are determined 
by scratchdir permission or umask. If the scratchdir has different permissions 
from those of warehouse table directories, the problem occurs.

Solution:
After the FileSystem.rename is called, changing all renamed (moved) files/dirs 
to their destination parents' permissions if needed (say if 
hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method 
renameFile doing both rename and permission. It replaces the FileSystem.rename 
used in LoadTable/LoadPartition. I do not replace rename used to move 
files/dirs under same scratchdir in the middle of task processing. It looks to 
me not necessary since they are temp files and also probably access protected 
by top scratchdir mode 700 (HIVE-4487).


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 87a584d 

Diff: https://reviews.apache.org/r/12050/diff/


Testing
---

The following cases tested that all created subdirs/files inherit their 
parents' permission mode and group in : 1). create database; 2). create table; 
3). load data; 4) insert overwrite; 5) partitions.
{code}
hive dfs -ls -d /user/tester1/hive;
   
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:20 
/user/tester1/hive

hive create database tester1 COMMENT 'Database for user tester1' LOCATION 
'/user/tester1/hive/tester1.db';
hive dfs -ls -R /user/tester1/hive;

drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:21 
/user/tester1/hive/tester1.db

hive use tester1;
hive  create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
hive dfs -ls -R /user/tester1/hive;

drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 

Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)

2013-07-19 Thread Chaoyu Tang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12050/
---

(Updated July 19, 2013, 6:55 p.m.)


Review request for hive, Ashutosh Chauhan and Sushanth Sowmyan.


Bugs: HIVE- and HIVE-3756
https://issues.apache.org/jira/browse/HIVE-
https://issues.apache.org/jira/browse/HIVE-3756


Repository: hive-git


Description
---

Problems:
1. When doing load data or insert overwrite to a table, the data files under 
database/table directory could not inherit their parent's permissions (i.e. 
group) as described in HIVE-3756.
2. Beside the group issue, the read/write permission mode is also not inherited
3. Same problem affects the partition files (see HIVE-3094)

Cause:
The task results (from load data or insert overwrite) are initially stored in 
scratchdir and then loaded under warehouse table directory. FileSystem.rename 
is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but 
it preserves their permissions (including group and mode) which are determined 
by scratchdir permission or umask. If the scratchdir has different permissions 
from those of warehouse table directories, the problem occurs.

Solution:
After the FileSystem.rename is called, changing all renamed (moved) files/dirs 
to their destination parents' permissions if needed (say if 
hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method 
renameFile doing both rename and permission. It replaces the FileSystem.rename 
used in LoadTable/LoadPartition. I do not replace rename used to move 
files/dirs under same scratchdir in the middle of task processing. It looks to 
me not necessary since they are temp files and also probably access protected 
by top scratchdir mode 700 (HIVE-4487).


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 87a584d 

Diff: https://reviews.apache.org/r/12050/diff/


Testing
---

The following cases tested that all created subdirs/files inherit their 
parents' permission mode and group in : 1). create database; 2). create table; 
3). load data; 4) insert overwrite; 5) partitions.
{code}
hive dfs -ls -d /user/tester1/hive;
   
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:20 
/user/tester1/hive

hive create database tester1 COMMENT 'Database for user tester1' LOCATION 
'/user/tester1/hive/tester1.db';
hive dfs -ls -R /user/tester1/hive;

drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:21 
/user/tester1/hive/tester1.db

hive use tester1;
hive  create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
hive dfs -ls -R /user/tester1/hive;

drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
/user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
/user/tester1/hive/tester1.db/tst1

hive  load data local inpath '/home/tester1/tst1.input' into table tst1;   

hive dfs -ls -R /user/tester1/hive; 
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
/user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:23 
/user/tester1/hive/tester1.db/tst1
-rw-rw   3 tester1 testgroup123168 2013-06-22 13:23 
/user/tester1/hive/tester1.db/tst1/tst1.input

hive create table tester1.tst2(col1 int, col2 string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE;
hive dfs -ls -R /user/tester1/hive;

drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:24 
/user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:23 
/user/tester1/hive/tester1.db/tst1
-rw-rw   3 tester1 testgroup123168 2013-06-22 13:23 
/user/tester1/hive/tester1.db/tst1/tst1.input
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:24 
/user/tester1/hive/tester1.db/tst2

hive insert overwrite table tst2 select * from tst1;
hive dfs -ls -R /user/tester1/hive; 
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:25 
/user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:23 
/user/tester1/hive/tester1.db/tst1
-rw-rw   3 tester1 testgroup123168 2013-06-22 13:23 
/user/tester1/hive/tester1.db/tst1/tst1.input
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:25 
/user/tester1/hive/tester1.db/tst2
-rw-rw   3 tester1 testgroup123291 2013-06-22 13:25 
/user/tester1/hive/tester1.db/tst2/00_0

hive create table 

Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)

2013-07-16 Thread Chaoyu Tang


 On July 3, 2013, 1:46 a.m., Ashutosh Chauhan wrote:
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2128
  https://reviews.apache.org/r/12050/diff/2/?file=314926#file314926line2128
 
  I had quite a discussion on this with Rohini on HIVE-2936 on how to do 
  these fs ops in a performant and robust way. Feel free to follow that.

This piece of code is rewritten to:
{code}
  try {
//if destf is an existing directory, rename just move the src file/dir 
under destf and
//destf itself will be the parent dir after the rename
if (fs.getFileStatus(destf).isDir()) {
  destfp = destf;
  LOG.debug(Renaming: + destf.toString()  +  is an existing 
directory;  +
  srcf.toString() +  will move under it.);
}
  } catch (FileNotFoundException ignore) {
  } 
{code}
So it will be optimized to save 1 call if the destf exists


 On July 3, 2013, 1:46 a.m., Ashutosh Chauhan wrote:
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2151
  https://reviews.apache.org/r/12050/diff/2/?file=314926#file314926line2151
 
  Why not use api instead of FsShell? Does filesystem doesn't offer an 
  api for recursively doing chmod and chgrp?

FileSystem offers the APIs for changing permission and owner (group), but not 
recursively. An alternate implementation I thought of initially is like 
following:
{code}
  try {
FileStatus destParentfstatus = fs.getFileStatus(destfp);
String destfpGroup = destParentfstatus.getGroup();
FsPermission destfpPermission = destParentfstatus.getPermission();
FileStatus[] fstatusList = fs.listStatus(destf);
for (FileStatus fstatus : fstatusList) {
  fs.setOwner(fstatus.getPath(), null, destfpGroup);
  fs.setPermission(fstatus.getPath(), destfpPermission);
}
  } catch (FileNotFoundException fnfe) {
//it should not be thrown
  } catch (IOException e) {
throw new HiveException(Unable to set permissions of  + destf
+  to those of its parent  + destf.getParent(), e);   
  }
{code}
I chose to use the FsShell since the code with commands (-chgrp, -chmod) looked 
to me more intuitive. As for which way is more performant, I am not sure but I 
was also not able to tell any gain in the alternate approach. Any suggestion?


 On July 3, 2013, 1:46 a.m., Ashutosh Chauhan wrote:
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2265
  https://reviews.apache.org/r/12050/diff/2/?file=314926#file314926line2265
 
  I think following way to write this is more robust:
  if(inheritPerms) {
  fs.mkdirs(destfp, destfp.getParent().getPerms);
  } else {
  fs.mkdirs(destfp);
  }

It was following the Rohini's comments on HIVE-2936:
{code}
It is not safe to use fs.mkdirs(path, permission) because the dfs umask is 
applied on that permission in DFSClient which is not desired. We have been 
bitten by wrong permission issues because of using that API. It is always safer 
to do mkdirs() and then do a setPermission() if you are dealing with HDFS.
{code}
So I did:
{code}
 boolean success = fs.mkdirs(destfp);
 if (inheritPerms  success) {
   fs.setPermission(destfp, 
fs.getFileStatus(destfp.getParent()).getPermission());
 }
{code}

Not sure, since these APIs documentation is quite limited.


- Chaoyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12050/#review22698
---


On July 2, 2013, 4:39 p.m., Chaoyu Tang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/12050/
 ---
 
 (Updated July 2, 2013, 4:39 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE- and HIVE-3756
 https://issues.apache.org/jira/browse/HIVE-
 https://issues.apache.org/jira/browse/HIVE-3756
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Problems:
 1. When doing load data or insert overwrite to a table, the data files under 
 database/table directory could not inherit their parent's permissions (i.e. 
 group) as described in HIVE-3756.
 2. Beside the group issue, the read/write permission mode is also not 
 inherited
 3. Same problem affects the partition files (see HIVE-3094)
 
 Cause:
 The task results (from load data or insert overwrite) are initially stored in 
 scratchdir and then loaded under warehouse table directory. FileSystem.rename 
 is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files 
 but it preserves their permissions (including group and mode) which are 
 determined by scratchdir permission or umask. If the scratchdir has different 
 permissions from those of warehouse table directories, the problem occurs.
 
 

Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)

2013-07-02 Thread Chaoyu Tang


 On July 1, 2013, 6:26 p.m., Sushanth Sowmyan wrote:
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2128
  https://reviews.apache.org/r/12050/diff/1/?file=310452#file310452line2128
 
  I understand from javadoc that FileStatus.isDirectory() is supposed to 
  be the new way of calling and .isDir() is deprecated, but I wanted to point 
  out that hive trunk (with no hadoop revision args) does not even compile 
  for me with isDirectory() here.
  
  Also, given that .isDir() is used all over the place in hive, for the 
  sake of consistency, would this break functionality in any significant way 
  if you use it here?
 
 Sushanth Sowmyan wrote:
 To add more details, this patch only works if you are compiling against 
 Hadoop 2.x, and does not if you are compiling against 0.20.x or 1.x. If you 
 need to use isDirectory() for some functionality reason, you should add the 
 appropriate shims for it. Without that, this should be written as isDir().

A good point. Changed to .isDir. Thanks


- Chaoyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12050/#review22617
---


On June 22, 2013, 8:43 p.m., Chaoyu Tang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/12050/
 ---
 
 (Updated June 22, 2013, 8:43 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE- and HIVE-3756
 https://issues.apache.org/jira/browse/HIVE-
 https://issues.apache.org/jira/browse/HIVE-3756
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Problems:
 1. When doing load data or insert overwrite to a table, the data files under 
 database/table directory could not inherit their parent's permissions (i.e. 
 group) as described in HIVE-3756.
 2. Beside the group issue, the read/write permission mode is also not 
 inherited
 3. Same problem affects the partition files (see HIVE-3094)
 
 Cause:
 The task results (from load data or insert overwrite) are initially stored in 
 scratchdir and then loaded under warehouse table directory. FileSystem.rename 
 is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files 
 but it preserves their permissions (including group and mode) which are 
 determined by scratchdir permission or umask. If the scratchdir has different 
 permissions from those of warehouse table directories, the problem occurs.
 
 Solution:
 After the FileSystem.rename is called, changing all renamed (moved) 
 files/dirs to their destination parents' permissions if needed (say if 
 hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method 
 renameFile doing both rename and permission. It replaces the 
 FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename 
 used to move files/dirs under same scratchdir in the middle of task 
 processing. It looks to me not necessary since they are temp files and also 
 probably access protected by top scratchdir mode 700 (HIVE-4487).
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 
 
 Diff: https://reviews.apache.org/r/12050/diff/
 
 
 Testing
 ---
 
 The following cases tested that all created subdirs/files inherit their 
 parents' permission mode and group in : 1). create database; 2). create 
 table; 3). load data; 4) insert overwrite; 5) partitions.
 {code}
 hive dfs -ls -d /user/tester1/hive;  
  
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:20 
 /user/tester1/hive
 
 hive create database tester1 COMMENT 'Database for user tester1' LOCATION 
 '/user/tester1/hive/tester1.db';
 hive dfs -ls -R /user/tester1/hive;  
   
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:21 
 /user/tester1/hive/tester1.db
 
 hive use tester1;
 hive  create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
 hive dfs -ls -R /user/tester1/hive;  
   
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db/tst1
 
 hive  load data local inpath '/home/tester1/tst1.input' into table tst1; 
   
 hive dfs -ls -R /user/tester1/hive; 
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:23 
 /user/tester1/hive/tester1.db/tst1
 -rw-rw   3 tester1 

Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)

2013-07-02 Thread Chaoyu Tang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12050/
---

(Updated July 2, 2013, 4:39 p.m.)


Review request for hive.


Changes
---

update the change (use FileStatus.isDir instead of isDirectory)


Bugs: HIVE- and HIVE-3756
https://issues.apache.org/jira/browse/HIVE-
https://issues.apache.org/jira/browse/HIVE-3756


Repository: hive-git


Description
---

Problems:
1. When doing load data or insert overwrite to a table, the data files under 
database/table directory could not inherit their parent's permissions (i.e. 
group) as described in HIVE-3756.
2. Beside the group issue, the read/write permission mode is also not inherited
3. Same problem affects the partition files (see HIVE-3094)

Cause:
The task results (from load data or insert overwrite) are initially stored in 
scratchdir and then loaded under warehouse table directory. FileSystem.rename 
is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files but 
it preserves their permissions (including group and mode) which are determined 
by scratchdir permission or umask. If the scratchdir has different permissions 
from those of warehouse table directories, the problem occurs.

Solution:
After the FileSystem.rename is called, changing all renamed (moved) files/dirs 
to their destination parents' permissions if needed (say if 
hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method 
renameFile doing both rename and permission. It replaces the FileSystem.rename 
used in LoadTable/LoadPartition. I do not replace rename used to move 
files/dirs under same scratchdir in the middle of task processing. It looks to 
me not necessary since they are temp files and also probably access protected 
by top scratchdir mode 700 (HIVE-4487).


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 

Diff: https://reviews.apache.org/r/12050/diff/


Testing
---

The following cases tested that all created subdirs/files inherit their 
parents' permission mode and group in : 1). create database; 2). create table; 
3). load data; 4) insert overwrite; 5) partitions.
{code}
hive dfs -ls -d /user/tester1/hive;
   
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:20 
/user/tester1/hive

hive create database tester1 COMMENT 'Database for user tester1' LOCATION 
'/user/tester1/hive/tester1.db';
hive dfs -ls -R /user/tester1/hive;

drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:21 
/user/tester1/hive/tester1.db

hive use tester1;
hive  create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
hive dfs -ls -R /user/tester1/hive;

drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
/user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
/user/tester1/hive/tester1.db/tst1

hive  load data local inpath '/home/tester1/tst1.input' into table tst1;   

hive dfs -ls -R /user/tester1/hive; 
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
/user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:23 
/user/tester1/hive/tester1.db/tst1
-rw-rw   3 tester1 testgroup123168 2013-06-22 13:23 
/user/tester1/hive/tester1.db/tst1/tst1.input

hive create table tester1.tst2(col1 int, col2 string) ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE;
hive dfs -ls -R /user/tester1/hive;

drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:24 
/user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:23 
/user/tester1/hive/tester1.db/tst1
-rw-rw   3 tester1 testgroup123168 2013-06-22 13:23 
/user/tester1/hive/tester1.db/tst1/tst1.input
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:24 
/user/tester1/hive/tester1.db/tst2

hive insert overwrite table tst2 select * from tst1;
hive dfs -ls -R /user/tester1/hive; 
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:25 
/user/tester1/hive/tester1.db
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:23 
/user/tester1/hive/tester1.db/tst1
-rw-rw   3 tester1 testgroup123168 2013-06-22 13:23 
/user/tester1/hive/tester1.db/tst1/tst1.input
drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:25 
/user/tester1/hive/tester1.db/tst2
-rw-rw   3 tester1 testgroup123291 2013-06-22 13:25 

Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)

2013-07-02 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12050/#review22698
---



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
https://reviews.apache.org/r/12050/#comment46400

I had quite a discussion on this with Rohini on HIVE-2936 on how to do 
these fs ops in a performant and robust way. Feel free to follow that.



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
https://reviews.apache.org/r/12050/#comment46401

Why not use api instead of FsShell? Does filesystem doesn't offer an api 
for recursively doing chmod and chgrp?



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
https://reviews.apache.org/r/12050/#comment46398

I think following way to write this is more robust:
if(inheritPerms) {
fs.mkdirs(destfp, destfp.getParent().getPerms);
} else {
fs.mkdirs(destfp);
}



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
https://reviews.apache.org/r/12050/#comment46399

same as previous comment.


Couple of comments on api usage.

- Ashutosh Chauhan


On July 2, 2013, 4:39 p.m., Chaoyu Tang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/12050/
 ---
 
 (Updated July 2, 2013, 4:39 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE- and HIVE-3756
 https://issues.apache.org/jira/browse/HIVE-
 https://issues.apache.org/jira/browse/HIVE-3756
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Problems:
 1. When doing load data or insert overwrite to a table, the data files under 
 database/table directory could not inherit their parent's permissions (i.e. 
 group) as described in HIVE-3756.
 2. Beside the group issue, the read/write permission mode is also not 
 inherited
 3. Same problem affects the partition files (see HIVE-3094)
 
 Cause:
 The task results (from load data or insert overwrite) are initially stored in 
 scratchdir and then loaded under warehouse table directory. FileSystem.rename 
 is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files 
 but it preserves their permissions (including group and mode) which are 
 determined by scratchdir permission or umask. If the scratchdir has different 
 permissions from those of warehouse table directories, the problem occurs.
 
 Solution:
 After the FileSystem.rename is called, changing all renamed (moved) 
 files/dirs to their destination parents' permissions if needed (say if 
 hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method 
 renameFile doing both rename and permission. It replaces the 
 FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename 
 used to move files/dirs under same scratchdir in the middle of task 
 processing. It looks to me not necessary since they are temp files and also 
 probably access protected by top scratchdir mode 700 (HIVE-4487).
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 
 
 Diff: https://reviews.apache.org/r/12050/diff/
 
 
 Testing
 ---
 
 The following cases tested that all created subdirs/files inherit their 
 parents' permission mode and group in : 1). create database; 2). create 
 table; 3). load data; 4) insert overwrite; 5) partitions.
 {code}
 hive dfs -ls -d /user/tester1/hive;  
  
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:20 
 /user/tester1/hive
 
 hive create database tester1 COMMENT 'Database for user tester1' LOCATION 
 '/user/tester1/hive/tester1.db';
 hive dfs -ls -R /user/tester1/hive;  
   
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:21 
 /user/tester1/hive/tester1.db
 
 hive use tester1;
 hive  create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
 hive dfs -ls -R /user/tester1/hive;  
   
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db/tst1
 
 hive  load data local inpath '/home/tester1/tst1.input' into table tst1; 
   
 hive dfs -ls -R /user/tester1/hive; 
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:23 
 /user/tester1/hive/tester1.db/tst1
 -rw-rw   3 tester1 testgroup123168 2013-06-22 13:23 
 

Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)

2013-07-01 Thread Sushanth Sowmyan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12050/#review22617
---



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
https://reviews.apache.org/r/12050/#comment46309

I understand from javadoc that FileStatus.isDirectory() is supposed to be 
the new way of calling and .isDir() is deprecated, but I wanted to point out 
that hive trunk (with no hadoop revision args) does not even compile for me 
with isDirectory() here.

Also, given that .isDir() is used all over the place in hive, for the sake 
of consistency, would this break functionality in any significant way if you 
use it here?


- Sushanth Sowmyan


On June 22, 2013, 8:43 p.m., Chaoyu Tang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/12050/
 ---
 
 (Updated June 22, 2013, 8:43 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE- and HIVE-3756
 https://issues.apache.org/jira/browse/HIVE-
 https://issues.apache.org/jira/browse/HIVE-3756
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Problems:
 1. When doing load data or insert overwrite to a table, the data files under 
 database/table directory could not inherit their parent's permissions (i.e. 
 group) as described in HIVE-3756.
 2. Beside the group issue, the read/write permission mode is also not 
 inherited
 3. Same problem affects the partition files (see HIVE-3094)
 
 Cause:
 The task results (from load data or insert overwrite) are initially stored in 
 scratchdir and then loaded under warehouse table directory. FileSystem.rename 
 is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files 
 but it preserves their permissions (including group and mode) which are 
 determined by scratchdir permission or umask. If the scratchdir has different 
 permissions from those of warehouse table directories, the problem occurs.
 
 Solution:
 After the FileSystem.rename is called, changing all renamed (moved) 
 files/dirs to their destination parents' permissions if needed (say if 
 hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method 
 renameFile doing both rename and permission. It replaces the 
 FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename 
 used to move files/dirs under same scratchdir in the middle of task 
 processing. It looks to me not necessary since they are temp files and also 
 probably access protected by top scratchdir mode 700 (HIVE-4487).
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 
 
 Diff: https://reviews.apache.org/r/12050/diff/
 
 
 Testing
 ---
 
 The following cases tested that all created subdirs/files inherit their 
 parents' permission mode and group in : 1). create database; 2). create 
 table; 3). load data; 4) insert overwrite; 5) partitions.
 {code}
 hive dfs -ls -d /user/tester1/hive;  
  
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:20 
 /user/tester1/hive
 
 hive create database tester1 COMMENT 'Database for user tester1' LOCATION 
 '/user/tester1/hive/tester1.db';
 hive dfs -ls -R /user/tester1/hive;  
   
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:21 
 /user/tester1/hive/tester1.db
 
 hive use tester1;
 hive  create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
 hive dfs -ls -R /user/tester1/hive;  
   
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db/tst1
 
 hive  load data local inpath '/home/tester1/tst1.input' into table tst1; 
   
 hive dfs -ls -R /user/tester1/hive; 
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:23 
 /user/tester1/hive/tester1.db/tst1
 -rw-rw   3 tester1 testgroup123168 2013-06-22 13:23 
 /user/tester1/hive/tester1.db/tst1/tst1.input
 
 hive create table tester1.tst2(col1 int, col2 string) ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE;
 hive dfs -ls -R /user/tester1/hive;  
   
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:24 
 /user/tester1/hive/tester1.db
 drwxrwx---   - tester1 testgroup123  

Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)

2013-07-01 Thread Sushanth Sowmyan


 On July 1, 2013, 6:26 p.m., Sushanth Sowmyan wrote:
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2128
  https://reviews.apache.org/r/12050/diff/1/?file=310452#file310452line2128
 
  I understand from javadoc that FileStatus.isDirectory() is supposed to 
  be the new way of calling and .isDir() is deprecated, but I wanted to point 
  out that hive trunk (with no hadoop revision args) does not even compile 
  for me with isDirectory() here.
  
  Also, given that .isDir() is used all over the place in hive, for the 
  sake of consistency, would this break functionality in any significant way 
  if you use it here?

To add more details, this patch only works if you are compiling against Hadoop 
2.x, and does not if you are compiling against 0.20.x or 1.x. If you need to 
use isDirectory() for some functionality reason, you should add the appropriate 
shims for it. Without that, this should be written as isDir().


- Sushanth


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12050/#review22617
---


On June 22, 2013, 8:43 p.m., Chaoyu Tang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/12050/
 ---
 
 (Updated June 22, 2013, 8:43 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE- and HIVE-3756
 https://issues.apache.org/jira/browse/HIVE-
 https://issues.apache.org/jira/browse/HIVE-3756
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Problems:
 1. When doing load data or insert overwrite to a table, the data files under 
 database/table directory could not inherit their parent's permissions (i.e. 
 group) as described in HIVE-3756.
 2. Beside the group issue, the read/write permission mode is also not 
 inherited
 3. Same problem affects the partition files (see HIVE-3094)
 
 Cause:
 The task results (from load data or insert overwrite) are initially stored in 
 scratchdir and then loaded under warehouse table directory. FileSystem.rename 
 is used in this step (e.g. LoadTable/LoadPartition) to move the dirs/files 
 but it preserves their permissions (including group and mode) which are 
 determined by scratchdir permission or umask. If the scratchdir has different 
 permissions from those of warehouse table directories, the problem occurs.
 
 Solution:
 After the FileSystem.rename is called, changing all renamed (moved) 
 files/dirs to their destination parents' permissions if needed (say if 
 hive.warehouse.subdir.inherit.perms is true). Here I introduced a new method 
 renameFile doing both rename and permission. It replaces the 
 FileSystem.rename used in LoadTable/LoadPartition. I do not replace rename 
 used to move files/dirs under same scratchdir in the middle of task 
 processing. It looks to me not necessary since they are temp files and also 
 probably access protected by top scratchdir mode 700 (HIVE-4487).
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 17daaa1 
 
 Diff: https://reviews.apache.org/r/12050/diff/
 
 
 Testing
 ---
 
 The following cases tested that all created subdirs/files inherit their 
 parents' permission mode and group in : 1). create database; 2). create 
 table; 3). load data; 4) insert overwrite; 5) partitions.
 {code}
 hive dfs -ls -d /user/tester1/hive;  
  
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:20 
 /user/tester1/hive
 
 hive create database tester1 COMMENT 'Database for user tester1' LOCATION 
 '/user/tester1/hive/tester1.db';
 hive dfs -ls -R /user/tester1/hive;  
   
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:21 
 /user/tester1/hive/tester1.db
 
 hive use tester1;
 hive  create table tester1.tst1(col1 int, col2 string) ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
 hive dfs -ls -R /user/tester1/hive;  
   
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db/tst1
 
 hive  load data local inpath '/home/tester1/tst1.input' into table tst1; 
   
 hive dfs -ls -R /user/tester1/hive; 
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:22 
 /user/tester1/hive/tester1.db
 drwxrwx---   - tester1 testgroup123  0 2013-06-22 13:23 
 /user/tester1/hive/tester1.db/tst1
 -rw-rw   3 tester1 testgroup123168 2013-06-22 13:23