[jira] [Commented] (HIVE-5207) Support data encryption for Hive tables

2014-09-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130892#comment-14130892
 ] 

Sergio Peña commented on HIVE-5207:
---

I see this is a full encryption solution which looks prety good, but since HDFS 
encryption is comming soon (HDFS-6134) I'd like to make some improvements to 
make Hive work with it. This is meant to be compatible with HDFS encryption 
only. It still lacks the ability to encrypt tables on fly through Hive 
statements.

See HIVE-8065

 Support data encryption for Hive tables
 ---

 Key: HIVE-5207
 URL: https://issues.apache.org/jira/browse/HIVE-5207
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.12.0
Reporter: Jerry Chen
  Labels: Rhino
 Attachments: HIVE-5207.patch, HIVE-5207.patch

   Original Estimate: 504h
  Remaining Estimate: 504h

 For sensitive and legally protected data such as personal information, it is 
 a common practice that the data is stored encrypted in the file system. To 
 enable Hive with the ability to store and query the encrypted data is very 
 crucial for Hive data analysis in enterprise. 
  
 When creating table, user can specify whether a table is an encrypted table 
 or not by specify a property in TBLPROPERTIES. Once an encrypted table is 
 created, query on the encrypted table is transparent as long as the 
 corresponding key management facilities are set in the running environment of 
 query. We can use hadoop crypto provided by HADOOP-9331 for underlying data 
 encryption and decryption. 
  
 As to key management, we would support several common key management use 
 cases. First, the table key (data key) can be stored in the Hive metastore 
 associated with the table in properties. The table key can be explicit 
 specified or auto generated and will be encrypted with a master key. There 
 are cases that the data being processed is generated by other applications, 
 we need to support externally managed or imported table keys. Also, the data 
 generated by Hive may be consumed by other applications in the system. We 
 need to a tool or command for exporting the table key to a java keystore for 
 using externally.
  
 To handle versions of Hadoop that do not have crypto support, we can avoid 
 compilation problems by segregating crypto API usage into separate files 
 (shims) to be included only if a flag is defined on the Ant command line 
 (something like –Dcrypto=true).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5207) Support data encryption for Hive tables

2013-10-14 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794816#comment-13794816
 ] 

Jerry Chen commented on HIVE-5207:
--

{quote}This patch won't compile, because Hive has to work when used with Hadoop 
1.x. The shims are used to support multiple versions of Hadoop (Hadoop 0.20, 
Hadoop 1.x, Hadoop 0.23, Hadoop 2.x) depending on what is install on the host 
system.{quote}
This patch depends on the crypto feature added by HADOOP-9331 and others. For 
this patch to compile with the Hadoop versions that does included crypto 
feature, we need to add flag –Dcrypto=true to disable this feature. I do 
understand that this approach still doesn’t align with the target of support 
multiple versions for a single compile of Hive. 
{quote}Furthermore, this seems likes the wrong direction. What is the advantage 
of this rather large patch over using the cfs work? If the user defines a table 
in cfs all of the table's data will be encrypted.{quote}
I agree that cfs work has its value on transparency on API and it is a good 
stuff. We are working on CFS and it is currently not available yet. And the 
work here is already in use by our users who are using this to protect 
sensitive data on their clusters while being able to transparently decrypt the 
data while running jobs that process this encrypted data. 

On the other hand, we see that compression codec is already widely used for 
various file formats used by Hive. The issue here may be the current approach 
depends on the changes to specific file formats for handling encryption key 
contexts. One possible direction is to make the encryption codec strictly the 
same as compression codec so that Hive can utilize a codec doing encryption or 
decryption without any changes to file formats and Hive. If we can do that, it 
adds the same value as compression codec.  


 Support data encryption for Hive tables
 ---

 Key: HIVE-5207
 URL: https://issues.apache.org/jira/browse/HIVE-5207
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.12.0
Reporter: Jerry Chen
  Labels: Rhino
 Attachments: HIVE-5207.patch, HIVE-5207.patch

   Original Estimate: 504h
  Remaining Estimate: 504h

 For sensitive and legally protected data such as personal information, it is 
 a common practice that the data is stored encrypted in the file system. To 
 enable Hive with the ability to store and query the encrypted data is very 
 crucial for Hive data analysis in enterprise. 
  
 When creating table, user can specify whether a table is an encrypted table 
 or not by specify a property in TBLPROPERTIES. Once an encrypted table is 
 created, query on the encrypted table is transparent as long as the 
 corresponding key management facilities are set in the running environment of 
 query. We can use hadoop crypto provided by HADOOP-9331 for underlying data 
 encryption and decryption. 
  
 As to key management, we would support several common key management use 
 cases. First, the table key (data key) can be stored in the Hive metastore 
 associated with the table in properties. The table key can be explicit 
 specified or auto generated and will be encrypted with a master key. There 
 are cases that the data being processed is generated by other applications, 
 we need to support externally managed or imported table keys. Also, the data 
 generated by Hive may be consumed by other applications in the system. We 
 need to a tool or command for exporting the table key to a java keystore for 
 using externally.
  
 To handle versions of Hadoop that do not have crypto support, we can avoid 
 compilation problems by segregating crypto API usage into separate files 
 (shims) to be included only if a flag is defined on the Ant command line 
 (something like –Dcrypto=true).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5207) Support data encryption for Hive tables

2013-10-13 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793897#comment-13793897
 ] 

Jerry Chen commented on HIVE-5207:
--

Hi Larry, thanks for you pointing out the docs. Yes, we will complement more 
javadocs and document as our next work.
 
{quote}1. TwoTieredKey - exactly the purpose, how it's used what the tiers are, 
etc{quote}
TwoTiredKey is used for the case that the table key is stored in the Hive 
metastore. The table key will be encrypted with the master key which is 
provided externally. In this case, user maintains and manages only the master 
key externally other than manages all the table keys externally. This is useful 
when there is no full-fledged key management system available.
 
{quote}2. External KeyManagement integration - where and what is the expected 
contract for this integration{quote}
To integrate with external key management system, we use the KeyProvider 
interface in HADOOP-9331. Implementation of KeyProvider interface for a 
specified key management system can be set as KeyProvider for retrieving key.
 
{quote}3. A specific usecase description for exporting keys into an external 
keystore and who has the authority to initiate the export and where the 
password comes from{quote}
Exporting of the internal keys comes with the Hive command line. As the 
internal table keys were encrypted with the master key, when performing the 
exporting, the master key must be provided in the environment which is 
controlled by the user.  If the master key is not available, the encrypted 
table keys for exporting cannot be decrypted and thus cannot be exported. The 
KeyProvider implementation for retrieving master key can provide its own 
authentication and authorization for deciding whether the current user has 
access to a specific key.
 
{quote}4. An explanation as to why we should ever store the key with the data 
which seems like a bad idea. I understand that it is encrypted with the master 
secret - which takes me to the next question.  {quote}
Exactly speaking, it is not with the data. The table key is stored in the Hive 
metastore. I see your points at this question. Just as mentioned, for use cases 
that there is no full-fledged and ready to use key management system available, 
it is useful. We provide several alternatives for managing keys. When creating 
an encrypted table, user can specify whether the key is managed externally or 
internally. For externally managed keys, only the key name (alias) will be 
stored in the Hive metastore and the key will be retrieved through KeyProvider 
set in the configuration.
 
{quote}5. Where is the master secret established and stored and how is it 
protected{quote}
Currently, we assume that the user manages the master key. For example, for 
simple uses cases, he can stores the master key in java KeyStore which 
protected by a password and stores in the folder which is read-only for 
specific user or groups. User can also stores the master key in other key 
management system as the master key is retrieved through KeyProvider.
 
Really appreciate your time reviewing this.
Thanks

 Support data encryption for Hive tables
 ---

 Key: HIVE-5207
 URL: https://issues.apache.org/jira/browse/HIVE-5207
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.12.0
Reporter: Jerry Chen
  Labels: Rhino
 Attachments: HIVE-5207.patch, HIVE-5207.patch

   Original Estimate: 504h
  Remaining Estimate: 504h

 For sensitive and legally protected data such as personal information, it is 
 a common practice that the data is stored encrypted in the file system. To 
 enable Hive with the ability to store and query the encrypted data is very 
 crucial for Hive data analysis in enterprise. 
  
 When creating table, user can specify whether a table is an encrypted table 
 or not by specify a property in TBLPROPERTIES. Once an encrypted table is 
 created, query on the encrypted table is transparent as long as the 
 corresponding key management facilities are set in the running environment of 
 query. We can use hadoop crypto provided by HADOOP-9331 for underlying data 
 encryption and decryption. 
  
 As to key management, we would support several common key management use 
 cases. First, the table key (data key) can be stored in the Hive metastore 
 associated with the table in properties. The table key can be explicit 
 specified or auto generated and will be encrypted with a master key. There 
 are cases that the data being processed is generated by other applications, 
 we need to support externally managed or imported table keys. Also, the data 
 generated by Hive may be consumed by other applications in the system. We 
 need to a tool or command for exporting the table key to a java keystore for 
 using externally.
  
 

[jira] [Commented] (HIVE-5207) Support data encryption for Hive tables

2013-10-08 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789356#comment-13789356
 ] 

Owen O'Malley commented on HIVE-5207:
-

This patch won't compile, because Hive has to work when used with Hadoop 1.x. 
The shims are used to support multiple versions of Hadoop (Hadoop 0.20, Hadoop 
1.x, Hadoop 0.23, Hadoop 2.x) depending on what is install on the host system.

Furthermore, this seems likes the wrong direction. What is the advantage of 
this rather large patch over using the cfs work? If the user defines a table in 
cfs all of the table's data will be encrypted.

 Support data encryption for Hive tables
 ---

 Key: HIVE-5207
 URL: https://issues.apache.org/jira/browse/HIVE-5207
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.12.0
Reporter: Jerry Chen
  Labels: Rhino
 Attachments: HIVE-5207.patch, HIVE-5207.patch

   Original Estimate: 504h
  Remaining Estimate: 504h

 For sensitive and legally protected data such as personal information, it is 
 a common practice that the data is stored encrypted in the file system. To 
 enable Hive with the ability to store and query the encrypted data is very 
 crucial for Hive data analysis in enterprise. 
  
 When creating table, user can specify whether a table is an encrypted table 
 or not by specify a property in TBLPROPERTIES. Once an encrypted table is 
 created, query on the encrypted table is transparent as long as the 
 corresponding key management facilities are set in the running environment of 
 query. We can use hadoop crypto provided by HADOOP-9331 for underlying data 
 encryption and decryption. 
  
 As to key management, we would support several common key management use 
 cases. First, the table key (data key) can be stored in the Hive metastore 
 associated with the table in properties. The table key can be explicit 
 specified or auto generated and will be encrypted with a master key. There 
 are cases that the data being processed is generated by other applications, 
 we need to support externally managed or imported table keys. Also, the data 
 generated by Hive may be consumed by other applications in the system. We 
 need to a tool or command for exporting the table key to a java keystore for 
 using externally.
  
 To handle versions of Hadoop that do not have crypto support, we can avoid 
 compilation problems by segregating crypto API usage into separate files 
 (shims) to be included only if a flag is defined on the Ant command line 
 (something like –Dcrypto=true).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5207) Support data encryption for Hive tables

2013-09-25 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777416#comment-13777416
 ] 

Larry McCay commented on HIVE-5207:
---

Hi Jerry - I have taken a high level look through the patch. Lots of good stuff 
there - good work! A couple things that I would like to see more javadocs on 
and perhaps a document that describe the usecases:

1. TwoTieredKey - exactly the purpose, how it's used what the tiers are, etc
2. External KeyManagement integration - where and what is the expected contract 
for this integration
3. A specific usecase description for exporting keys into an external keystore 
and who has the authority to initiate the export and where the password comes 
from
4. An explanation as to why we should ever store the key with the data which 
seems like a bad idea. I understand that it is encrypted with the master secret 
- which takes me to the next question. :)
5. Where is the master secret established and stored and how is it protected

There is a minor typo/spelling error that you probably want to fix now rather 
than later:

+public interface HiveKeyResolver  {
+  void init(Configuration conf) throws CryptoException;
+
+  /**
+   * Resolve the key meta information of a table
+   * @param tableDesc The table descriptor
+   */
+  KeyMeta resovleKey(TableDesc tableDesc);
+}

change resovleKey to resolveKey here and in the interface implementation and 
consumer of the method - I think there were 3 instances.

Again, nice work here!
Let's get some higher level descriptions in code javadocs and/or separate 
documents.
Thanks!


 Support data encryption for Hive tables
 ---

 Key: HIVE-5207
 URL: https://issues.apache.org/jira/browse/HIVE-5207
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.12.0
Reporter: Jerry Chen
  Labels: Rhino
 Attachments: HIVE-5207.patch

   Original Estimate: 504h
  Remaining Estimate: 504h

 For sensitive and legally protected data such as personal information, it is 
 a common practice that the data is stored encrypted in the file system. To 
 enable Hive with the ability to store and query the encrypted data is very 
 crucial for Hive data analysis in enterprise. 
  
 When creating table, user can specify whether a table is an encrypted table 
 or not by specify a property in TBLPROPERTIES. Once an encrypted table is 
 created, query on the encrypted table is transparent as long as the 
 corresponding key management facilities are set in the running environment of 
 query. We can use hadoop crypto provided by HADOOP-9331 for underlying data 
 encryption and decryption. 
  
 As to key management, we would support several common key management use 
 cases. First, the table key (data key) can be stored in the Hive metastore 
 associated with the table in properties. The table key can be explicit 
 specified or auto generated and will be encrypted with a master key. There 
 are cases that the data being processed is generated by other applications, 
 we need to support externally managed or imported table keys. Also, the data 
 generated by Hive may be consumed by other applications in the system. We 
 need to a tool or command for exporting the table key to a java keystore for 
 using externally.
  
 To handle versions of Hadoop that do not have crypto support, we can avoid 
 compilation problems by segregating crypto API usage into separate files 
 (shims) to be included only if a flag is defined on the Ant command line 
 (something like –Dcrypto=true).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5207) Support data encryption for Hive tables

2013-09-18 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770520#comment-13770520
 ] 

Jerry Chen commented on HIVE-5207:
--

HIVE-4227 is specifically for adding column level encryption for ORC files. As 
we all know, Hive tables support various other formats such as text file, 
sequence file, RC file and Avro file. HIVE-5207 is targeting to address these 
file formats as a common problem for supporting encryption.

As to key management, from the user’s perspective one rational approach is one 
table key per encrypted table. In this concept, it is natural to associate the 
key with TblProperties

 Support data encryption for Hive tables
 ---

 Key: HIVE-5207
 URL: https://issues.apache.org/jira/browse/HIVE-5207
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.12.0
Reporter: Jerry Chen
  Labels: Rhino
   Original Estimate: 504h
  Remaining Estimate: 504h

 For sensitive and legally protected data such as personal information, it is 
 a common practice that the data is stored encrypted in the file system. To 
 enable Hive with the ability to store and query the encrypted data is very 
 crucial for Hive data analysis in enterprise. 
  
 When creating table, user can specify whether a table is an encrypted table 
 or not by specify a property in TBLPROPERTIES. Once an encrypted table is 
 created, query on the encrypted table is transparent as long as the 
 corresponding key management facilities are set in the running environment of 
 query. We can use hadoop crypto provided by HADOOP-9331 for underlying data 
 encryption and decryption. 
  
 As to key management, we would support several common key management use 
 cases. First, the table key (data key) can be stored in the Hive metastore 
 associated with the table in properties. The table key can be explicit 
 specified or auto generated and will be encrypted with a master key. There 
 are cases that the data being processed is generated by other applications, 
 we need to support externally managed or imported table keys. Also, the data 
 generated by Hive may be consumed by other applications in the system. We 
 need to a tool or command for exporting the table key to a java keystore for 
 using externally.
  
 To handle versions of Hadoop that do not have crypto support, we can avoid 
 compilation problems by segregating crypto API usage into separate files 
 (shims) to be included only if a flag is defined on the Ant command line 
 (something like –Dcrypto=true).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5207) Support data encryption for Hive tables

2013-09-03 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757433#comment-13757433
 ] 

Larry McCay commented on HIVE-5207:
---

This seems to be a duplicate of HIVE-4227. I am actually in the process of 
working on that functionality and plan to leverage HADOOP-9331 as appropriate. 
We will need to rationalize these Jiras. Maybe you calling out the difference 
between the Jiras as the entire table being encrypted here rather than the 
individual columns in 4227? I think that if we need both levels of granularity 
that they need to be based on the same solution.

The key management aspect is one that we will need to sync on. The patch in 
HADOOP-9534 (CMF) is being refactored in order to support our API needs for 
acquiring keys for Hive encryption and presumably for CryptoFS. Generally 
speaking, the nonce/iv, alias and version indicator will be stored within the 
colstore in Hive for decryption. That is the current thinking anyway.

Support for multiple key revisions per alias will allow for rotation and 
rolling of keys within the datastores.

CMF will provide pluggability for talking to key management/data protection 
providers: initially a JCEKS keystore and eventually a central key 
management/data protection service for Hadoop. The central service will also 
provide pluggability for integrating third party providers/solutions.

TableProperties is one way to indicate the need for data protection - we are 
looking at others as well - but of course I am currently looking at column 
level indicators too.

Let's figure out how to combine or consolidate these Jiras so that we can 
hopefully get a coherent set of patches to collaborate with in a branch.

 Support data encryption for Hive tables
 ---

 Key: HIVE-5207
 URL: https://issues.apache.org/jira/browse/HIVE-5207
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.12.0
Reporter: Jerry Chen
  Labels: Rhino
   Original Estimate: 504h
  Remaining Estimate: 504h

 For sensitive and legally protected data such as personal information, it is 
 a common practice that the data is stored encrypted in the file system. To 
 enable Hive with the ability to store and query the encrypted data is very 
 crucial for Hive data analysis in enterprise. 
  
 When creating table, user can specify whether a table is an encrypted table 
 or not by specify a property in TBLPROPERTIES. Once an encrypted table is 
 created, query on the encrypted table is transparent as long as the 
 corresponding key management facilities are set in the running environment of 
 query. We can use hadoop crypto provided by HADOOP-9331 for underlying data 
 encryption and decryption. 
  
 As to key management, we would support several common key management use 
 cases. First, the table key (data key) can be stored in the Hive metastore 
 associated with the table in properties. The table key can be explicit 
 specified or auto generated and will be encrypted with a master key. There 
 are cases that the data being processed is generated by other applications, 
 we need to support externally managed or imported table keys. Also, the data 
 generated by Hive may be consumed by other applications in the system. We 
 need to a tool or command for exporting the table key to a java keystore for 
 using externally.
  
 To handle versions of Hadoop that do not have crypto support, we can avoid 
 compilation problems by segregating crypto API usage into separate files 
 (shims) to be included only if a flag is defined on the Ant command line 
 (something like –Dcrypto=true).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira