[jira] [Updated] (HIVE-8065) Support HDFS encryption functionality on Hive

2015-01-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8065:
--
Sprint: Sprint 1

 Support HDFS encryption functionality on Hive
 -

 Key: HIVE-8065
 URL: https://issues.apache.org/jira/browse/HIVE-8065
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
  Labels: Hive-Scrum

 The new encryption support on HDFS makes Hive incompatible and unusable when 
 this feature is used.
 HDFS encryption is designed so that an user can configure different 
 encryption zones (or directories) for multi-tenant environments. An 
 encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
 Because of security compliance, the HDFS does not allow to move/rename files 
 between encryption zones. Renames are allowed only inside the same encryption 
 zone. A copy is allowed between encryption zones.
 See HDFS-6134 for more details about HDFS encryption design.
 Hive currently uses a scratch directory (like /tmp/$user/$random). This 
 scratch directory is used for the output of intermediate data (between MR 
 jobs) and for the final output of the hive query which is later moved to the 
 table directory location.
 If Hive tables are in different encryption zones than the scratch directory, 
 then Hive won't be able to renames those files/directories, and it will make 
 Hive unusable.
 To handle this problem, we can change the scratch directory of the 
 query/statement to be inside the same encryption zone of the table directory 
 location. This way, the renaming process will be successful. 
 Also, for statements that move files between encryption zones (i.e. LOAD 
 DATA), a copy may be executed instead of a rename. This will cause an 
 overhead when copying large data files, but it won't break the encryption on 
 Hive.
 Another security thing to consider is when using joins selects. If Hive joins 
 different tables with different encryption key strengths, then the results of 
 the select might break the security compliance of the tables. Let's say two 
 tables with 128 bits and 256 bits encryption are joined, then the temporary 
 results might be stored in the 128 bits encryption zone. This will conflict 
 with the table encrypted with 256 bits temporary.
 To fix this, Hive should be able to select the scratch directory that is more 
 secured/encrypted in order to save the intermediate data temporary with no 
 compliance issues.
 For instance:
 {noformat}
 SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
 {noformat}
 - This should use a scratch directory (or staging directory) inside the 
 table-aes256 table location.
 {noformat}
 INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
 {noformat}
 - This should use a scratch directory inside the table-aes1 location.
 {noformat}
 FROM table-unencrypted
 INSERT OVERWRITE TABLE table-aes128 SELECT id, name
 INSERT OVERWRITE TABLE table-aes256 SELECT id, name
 {noformat}
 - This should use a scratch directory on each of the tables locations.
 - The first SELECT will have its scratch directory on table-aes128 directory.
 - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8065) Support HDFS encryption functionality on Hive

2014-12-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8065:
--
Labels: Hive-Scrum  (was: )

 Support HDFS encryption functionality on Hive
 -

 Key: HIVE-8065
 URL: https://issues.apache.org/jira/browse/HIVE-8065
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
  Labels: Hive-Scrum

 The new encryption support on HDFS makes Hive incompatible and unusable when 
 this feature is used.
 HDFS encryption is designed so that an user can configure different 
 encryption zones (or directories) for multi-tenant environments. An 
 encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
 Because of security compliance, the HDFS does not allow to move/rename files 
 between encryption zones. Renames are allowed only inside the same encryption 
 zone. A copy is allowed between encryption zones.
 See HDFS-6134 for more details about HDFS encryption design.
 Hive currently uses a scratch directory (like /tmp/$user/$random). This 
 scratch directory is used for the output of intermediate data (between MR 
 jobs) and for the final output of the hive query which is later moved to the 
 table directory location.
 If Hive tables are in different encryption zones than the scratch directory, 
 then Hive won't be able to renames those files/directories, and it will make 
 Hive unusable.
 To handle this problem, we can change the scratch directory of the 
 query/statement to be inside the same encryption zone of the table directory 
 location. This way, the renaming process will be successful. 
 Also, for statements that move files between encryption zones (i.e. LOAD 
 DATA), a copy may be executed instead of a rename. This will cause an 
 overhead when copying large data files, but it won't break the encryption on 
 Hive.
 Another security thing to consider is when using joins selects. If Hive joins 
 different tables with different encryption key strengths, then the results of 
 the select might break the security compliance of the tables. Let's say two 
 tables with 128 bits and 256 bits encryption are joined, then the temporary 
 results might be stored in the 128 bits encryption zone. This will conflict 
 with the table encrypted with 256 bits temporary.
 To fix this, Hive should be able to select the scratch directory that is more 
 secured/encrypted in order to save the intermediate data temporary with no 
 compliance issues.
 For instance:
 {noformat}
 SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
 {noformat}
 - This should use a scratch directory (or staging directory) inside the 
 table-aes256 table location.
 {noformat}
 INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
 {noformat}
 - This should use a scratch directory inside the table-aes1 location.
 {noformat}
 FROM table-unencrypted
 INSERT OVERWRITE TABLE table-aes128 SELECT id, name
 INSERT OVERWRITE TABLE table-aes256 SELECT id, name
 {noformat}
 - This should use a scratch directory on each of the tables locations.
 - The first SELECT will have its scratch directory on table-aes128 directory.
 - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)