[jira] [Commented] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110938#comment-14110938 ] Larry McCay commented on HIVE-6329: --- Hi [~navis] - It seems to me that we need to have some sort of AES encryption with proper key management and initialization capabilities in order to consider this ready for inclusion in Apache. If this is already being done then I may just be missing where it is in the patch. You made comments earlier on the jira about acquiring private keys over SSL - I don't see these being acquired or used in the patch either. Is there a companion jira that I need to see? Perhaps, a high level design document would be helpful in communicating your intent and in teasing out all of the requirements for key management, initialization and usage for this feature? Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.10.patch.txt, HIVE-6329.11.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111827#comment-14111827 ] Larry McCay commented on HIVE-6329: --- I see, [~navis]. So, the intent of this patch is to provide a hook within the SerDe mechanism with enough fidelity to do encryption but the initial implementation just provides an encoding to Base64 implementation. That helps me understand the patch more and I think you have accomplished this. I would be a bit leery of calling the hook and Base64 implementation that we are providing in this patch column level encryption/decryption - even though you are enabling someone to use it for that. This happens to be a patch that introduces column/value encoding/decoding. This is easily reversible and joinable across tables allowing correlations to be made. Are we able to frame the usecase that is actually represented by this patch as a problem that needs solving or do we need to make this implementation more robust in terms of encryption/decryption and all then key management requirements required to do that properly? I am just concerned about introducing new interfaces and hooks that need to be supported if they are not what we would consider strategic implementation choices for a given feature like encryption. Does the SerDe mechansim provide everything that we need? It seems like this approach provides little in terms of key management and metadata which are requisite for encryption mechanisms. Though, I may still be missing the forest for the trees. What I would like to do is ensure that our customers have a path forward with their needs met while not moving this forward in apache until we have an actual encryption mechanism available. Does that make sense? What do you think that will require? Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.10.patch.txt, HIVE-6329.11.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104059#comment-14104059 ] Larry McCay commented on HIVE-7634: --- Are there plans to commit this to branch-2? Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml -- Key: HIVE-7634 URL: https://issues.apache.org/jira/browse/HIVE-7634 Project: Hive Issue Type: Bug Components: Security Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7634.1.patch HADOOP-10607 provides a Configuration.getPassword() API that allows passwords to be retrieved from a configured credential provider, while also being able to fall back to the HiveConf setting if no provider is set up. Hive should use this API for versions of Hadoop that support this API. This would give users the ability to remove the passwords from their Hive configuration files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104078#comment-14104078 ] Larry McCay commented on HIVE-7634: --- Just realized that branch-2 is a hadoop branch. Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml -- Key: HIVE-7634 URL: https://issues.apache.org/jira/browse/HIVE-7634 Project: Hive Issue Type: Bug Components: Security Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7634.1.patch HADOOP-10607 provides a Configuration.getPassword() API that allows passwords to be retrieved from a configured credential provider, while also being able to fall back to the HiveConf setting if no provider is set up. Hive should use this API for versions of Hadoop that support this API. This would give users the ability to remove the passwords from their Hive configuration files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088028#comment-14088028 ] Larry McCay commented on HIVE-7634: --- Hi [~jdere] - there are similar uptake jiras being tracked on HADOOP-10904. Those patches may be of interest to you. Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml -- Key: HIVE-7634 URL: https://issues.apache.org/jira/browse/HIVE-7634 Project: Hive Issue Type: Bug Components: Security Reporter: Jason Dere Assignee: Jason Dere HADOOP-10607 provides a Configuration.getPassword() API that allows passwords to be retrieved from a configured credential provider, while also being able to fall back to the HiveConf setting if no provider is set up. Hive should use this API for versions of Hadoop that support this API. This would give users the ability to remove the passwords from their Hive configuration files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088033#comment-14088033 ] Larry McCay commented on HIVE-7634: --- HADOOP-10904 contains similar uptake patches Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml -- Key: HIVE-7634 URL: https://issues.apache.org/jira/browse/HIVE-7634 Project: Hive Issue Type: Bug Components: Security Reporter: Jason Dere Assignee: Jason Dere HADOOP-10607 provides a Configuration.getPassword() API that allows passwords to be retrieved from a configured credential provider, while also being able to fall back to the HiveConf setting if no provider is set up. Hive should use this API for versions of Hadoop that support this API. This would give users the ability to remove the passwords from their Hive configuration files. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7175) Provide password file option to beeline
[ https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027698#comment-14027698 ] Larry McCay commented on HIVE-7175: --- I just realized that this is the users' LDAP password. It would be unfortunate to have to leave this laying around in various places unless absolutely necessary. Does the beeline CLI currently allow for using the java Console to collect the password from the user? I understand that for scripting type purposes we may need another collection mechanism but for usecases with a user and console available the users' passwords should not be persisted outside of the directory itself when it can be avoided. For cases where it can not be avoided the side file approach is certainly better than on the command line itself in terms of visibility. Provide password file option to beeline --- Key: HIVE-7175 URL: https://issues.apache.org/jira/browse/HIVE-7175 Project: Hive Issue Type: Improvement Components: CLI, Clients Affects Versions: 0.13.0 Reporter: Robert Justice Assignee: Dr. Wendell Urth Labels: features, security Attachments: HIVE-7175.patch For people connecting to Hive Server 2 with LDAP authentication enabled, in order to batch run commands, we currently have to provide the password openly in the command line. They could use some expect scripting, but I think a valid improvement would be to provide a password file option similar to other CLI commands in hadoop (e.g. sqoop) to be more secure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7175) Provide password file option to beeline
[ https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017823#comment-14017823 ] Larry McCay commented on HIVE-7175: --- Hi [~rjustice] - we may want to consider the use of the CredentialProvider API that will be committed soon. See HADOOP-10607. This isn't mutually exclusive with the password file approach as there are plans to fallback to existing password files in certain components. However, the abstraction of the API is best realized through the new Configuration.getPassword(String name) method. This will allow you to ask for a configuration item that you know is a password and it will check for an aliased credential based on the name through the CredentialProvider API. If the name is not resolved into a credential from a provider then it falls back to the config file. The extra hop of the separate file isn't a problem but it isn't encapsulated by the getPassword method going into Configuration. Just something to keep in mind. Provide password file option to beeline --- Key: HIVE-7175 URL: https://issues.apache.org/jira/browse/HIVE-7175 Project: Hive Issue Type: Improvement Components: CLI, Clients Affects Versions: 0.13.0 Reporter: Robert Justice Labels: features, security For people connecting to Hive Server 2 with LDAP authentication enabled, in order to batch run commands, we currently have to provide the password openly in the command line. They could use some expect scripting, but I think a valid improvement would be to provide a password file option similar to other CLI commands in hadoop (e.g. sqoop) to be more secure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7175) Provide password file option to beeline
[ https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017866#comment-14017866 ] Larry McCay commented on HIVE-7175: --- These should be seen as complementary issues. Provide password file option to beeline --- Key: HIVE-7175 URL: https://issues.apache.org/jira/browse/HIVE-7175 Project: Hive Issue Type: Improvement Components: CLI, Clients Affects Versions: 0.13.0 Reporter: Robert Justice Labels: features, security For people connecting to Hive Server 2 with LDAP authentication enabled, in order to batch run commands, we currently have to provide the password openly in the command line. They could use some expect scripting, but I think a valid improvement would be to provide a password file option similar to other CLI commands in hadoop (e.g. sqoop) to be more secure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7010) templeton/v1/queue REST method has been removed
[ https://issues.apache.org/jira/browse/HIVE-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989795#comment-13989795 ] Larry McCay commented on HIVE-7010: --- H... so this change is backwards compatible then? What happens when a client calls the removed API? templeton/v1/queue REST method has been removed --- Key: HIVE-7010 URL: https://issues.apache.org/jira/browse/HIVE-7010 Project: Hive Issue Type: Bug Components: Documentation, WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Lefty Leverenz deprecated queue REST method was removed from WebHCat in HIVE-6432. jobs is the replacement. https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference needs to be updated -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7010) templeton/v1/queue REST method has been removed
[ https://issues.apache.org/jira/browse/HIVE-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989843#comment-13989843 ] Larry McCay commented on HIVE-7010: --- [~ekoifman] - thanks for the insight. I am still getting my head around how Knox will handle contract changes across the entire ecosystem. I don't think that asking you to revert it will help that in general but getting some sense of how the APIs evolve will inform how we should track those changes. It seems that that we will need to support component versioning rather than API versioning. If this is how APIs evolve across the other components as well then the version indicator in the URL doesn't seem very meaningful to me. templeton/v1/queue REST method has been removed --- Key: HIVE-7010 URL: https://issues.apache.org/jira/browse/HIVE-7010 Project: Hive Issue Type: Bug Components: Documentation, WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Lefty Leverenz deprecated queue REST method was removed from WebHCat in HIVE-6432. jobs is the replacement. https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference needs to be updated -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7010) templeton/v1/queue REST method has been removed
[ https://issues.apache.org/jira/browse/HIVE-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989886#comment-13989886 ] Larry McCay commented on HIVE-7010: --- That was the behavior that I expected actually. Getting all components to adhere to this would like be difficult but I think that it makes sense for all components that do support the version indicator to take that approach. Some components don't actually even have a version indicator in their APIs. templeton/v1/queue REST method has been removed --- Key: HIVE-7010 URL: https://issues.apache.org/jira/browse/HIVE-7010 Project: Hive Issue Type: Bug Components: Documentation, WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Lefty Leverenz deprecated queue REST method was removed from WebHCat in HIVE-6432. jobs is the replacement. https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference needs to be updated -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7010) templeton/v1/queue REST method has been removed
[ https://issues.apache.org/jira/browse/HIVE-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989202#comment-13989202 ] Larry McCay commented on HIVE-7010: --- Hi [~leftylev] - I am curious whether the v1 is going to be bumped up to v2 in response to the contract change. I believe that it is easier for consuming projects - such as Apache Knox - to be able to multiplex across expected APIs with specific version indicators. templeton/v1/queue REST method has been removed --- Key: HIVE-7010 URL: https://issues.apache.org/jira/browse/HIVE-7010 Project: Hive Issue Type: Bug Components: Documentation, WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Lefty Leverenz deprecated queue REST method was removed from WebHCat in HIVE-6432. jobs is the replacement. https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference needs to be updated -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898061#comment-13898061 ] Larry McCay commented on HIVE-6329: --- [~owen.omalley] I certainly agree that we need the IV but I'm not sure that I like it in the DDL. Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.indices'='2,3', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898092#comment-13898092 ] Larry McCay commented on HIVE-6329: --- That works. We would need to be able to determine that a particular row has no IV as well. This could be done by some well known constant or the size of the IV? A size of 0 would indicate that there is no encryption in the row. Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.indices'='2,3', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898154#comment-13898154 ] Larry McCay commented on HIVE-6329: --- I think it is important to understand that there is a separate key per col - therefore you wouldn't have the same cipher text for same clear text. Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.indices'='2,3', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5207) Support data encryption for Hive tables
[ https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777416#comment-13777416 ] Larry McCay commented on HIVE-5207: --- Hi Jerry - I have taken a high level look through the patch. Lots of good stuff there - good work! A couple things that I would like to see more javadocs on and perhaps a document that describe the usecases: 1. TwoTieredKey - exactly the purpose, how it's used what the tiers are, etc 2. External KeyManagement integration - where and what is the expected contract for this integration 3. A specific usecase description for exporting keys into an external keystore and who has the authority to initiate the export and where the password comes from 4. An explanation as to why we should ever store the key with the data which seems like a bad idea. I understand that it is encrypted with the master secret - which takes me to the next question. :) 5. Where is the master secret established and stored and how is it protected There is a minor typo/spelling error that you probably want to fix now rather than later: +public interface HiveKeyResolver { + void init(Configuration conf) throws CryptoException; + + /** + * Resolve the key meta information of a table + * @param tableDesc The table descriptor + */ + KeyMeta resovleKey(TableDesc tableDesc); +} change resovleKey to resolveKey here and in the interface implementation and consumer of the method - I think there were 3 instances. Again, nice work here! Let's get some higher level descriptions in code javadocs and/or separate documents. Thanks! Support data encryption for Hive tables --- Key: HIVE-5207 URL: https://issues.apache.org/jira/browse/HIVE-5207 Project: Hive Issue Type: New Feature Affects Versions: 0.12.0 Reporter: Jerry Chen Labels: Rhino Attachments: HIVE-5207.patch Original Estimate: 504h Remaining Estimate: 504h For sensitive and legally protected data such as personal information, it is a common practice that the data is stored encrypted in the file system. To enable Hive with the ability to store and query the encrypted data is very crucial for Hive data analysis in enterprise. When creating table, user can specify whether a table is an encrypted table or not by specify a property in TBLPROPERTIES. Once an encrypted table is created, query on the encrypted table is transparent as long as the corresponding key management facilities are set in the running environment of query. We can use hadoop crypto provided by HADOOP-9331 for underlying data encryption and decryption. As to key management, we would support several common key management use cases. First, the table key (data key) can be stored in the Hive metastore associated with the table in properties. The table key can be explicit specified or auto generated and will be encrypted with a master key. There are cases that the data being processed is generated by other applications, we need to support externally managed or imported table keys. Also, the data generated by Hive may be consumed by other applications in the system. We need to a tool or command for exporting the table key to a java keystore for using externally. To handle versions of Hadoop that do not have crypto support, we can avoid compilation problems by segregating crypto API usage into separate files (shims) to be included only if a flag is defined on the Ant command line (something like –Dcrypto=true). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4227) Add column level encryption to ORC files
[ https://issues.apache.org/jira/browse/HIVE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759139#comment-13759139 ] Larry McCay commented on HIVE-4227: --- I am in the process of reworking the patch for HADOOP-9534 Credential Management Framework in order to support accessing keying material for this issue. Current thinking is that CMF can abstract the source of keys and be leveraged across a number of different crypto and password protection usecases in the Hadoop ecosystem. This is why it is being done in Hadoop rather than Hive. We will want to also align it's use with HADOOP-9331 - since 9331 will be leveraged in here as well as for the cryptoFS, etc. Will provide a description of the DDL/metastore and column store changes that will be needed to support the column level encryption once I have it written up. Add column level encryption to ORC files Key: HIVE-4227 URL: https://issues.apache.org/jira/browse/HIVE-4227 Project: Hive Issue Type: New Feature Components: File Formats Reporter: Owen O'Malley Labels: gsoc, gsoc2013 It would be useful to support column level encryption in ORC files. Since each column and its associated index is stored separately, encrypting a column separately isn't difficult. In terms of key distribution, it would make sense to use an external server like the one in HADOOP-9331. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5207) Support data encryption for Hive tables
[ https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757433#comment-13757433 ] Larry McCay commented on HIVE-5207: --- This seems to be a duplicate of HIVE-4227. I am actually in the process of working on that functionality and plan to leverage HADOOP-9331 as appropriate. We will need to rationalize these Jiras. Maybe you calling out the difference between the Jiras as the entire table being encrypted here rather than the individual columns in 4227? I think that if we need both levels of granularity that they need to be based on the same solution. The key management aspect is one that we will need to sync on. The patch in HADOOP-9534 (CMF) is being refactored in order to support our API needs for acquiring keys for Hive encryption and presumably for CryptoFS. Generally speaking, the nonce/iv, alias and version indicator will be stored within the colstore in Hive for decryption. That is the current thinking anyway. Support for multiple key revisions per alias will allow for rotation and rolling of keys within the datastores. CMF will provide pluggability for talking to key management/data protection providers: initially a JCEKS keystore and eventually a central key management/data protection service for Hadoop. The central service will also provide pluggability for integrating third party providers/solutions. TableProperties is one way to indicate the need for data protection - we are looking at others as well - but of course I am currently looking at column level indicators too. Let's figure out how to combine or consolidate these Jiras so that we can hopefully get a coherent set of patches to collaborate with in a branch. Support data encryption for Hive tables --- Key: HIVE-5207 URL: https://issues.apache.org/jira/browse/HIVE-5207 Project: Hive Issue Type: New Feature Affects Versions: 0.12.0 Reporter: Jerry Chen Labels: Rhino Original Estimate: 504h Remaining Estimate: 504h For sensitive and legally protected data such as personal information, it is a common practice that the data is stored encrypted in the file system. To enable Hive with the ability to store and query the encrypted data is very crucial for Hive data analysis in enterprise. When creating table, user can specify whether a table is an encrypted table or not by specify a property in TBLPROPERTIES. Once an encrypted table is created, query on the encrypted table is transparent as long as the corresponding key management facilities are set in the running environment of query. We can use hadoop crypto provided by HADOOP-9331 for underlying data encryption and decryption. As to key management, we would support several common key management use cases. First, the table key (data key) can be stored in the Hive metastore associated with the table in properties. The table key can be explicit specified or auto generated and will be encrypted with a master key. There are cases that the data being processed is generated by other applications, we need to support externally managed or imported table keys. Also, the data generated by Hive may be consumed by other applications in the system. We need to a tool or command for exporting the table key to a java keystore for using externally. To handle versions of Hadoop that do not have crypto support, we can avoid compilation problems by segregating crypto API usage into separate files (shims) to be included only if a flag is defined on the Ant command line (something like –Dcrypto=true). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3591) set hive.security.authorization.enabled can be executed by any user
[ https://issues.apache.org/jira/browse/HIVE-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747571#comment-13747571 ] Larry McCay commented on HIVE-3591: --- It appears that System properties can override conf vars too. I assume that we should leverage the restrictList there as well. set hive.security.authorization.enabled can be executed by any user --- Key: HIVE-3591 URL: https://issues.apache.org/jira/browse/HIVE-3591 Project: Hive Issue Type: Bug Components: Authorization, CLI, Clients, JDBC Affects Versions: 0.7.1 Environment: RHEL 5.6 CDH U3 Reporter: Dev Gupta Labels: Authorization, Security The property hive.security.authorization.enabled can be set to true or false, by any user on the CLI, thus circumventing any previously set grants and authorizations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3591) set hive.security.authorization.enabled can be executed by any user
[ https://issues.apache.org/jira/browse/HIVE-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746744#comment-13746744 ] Larry McCay commented on HIVE-3591: --- What is the current status/thinking on this issue? Is it something that we should be addressing and are there any thoughts on how it should be prevented/restricted, etc? set hive.security.authorization.enabled can be executed by any user --- Key: HIVE-3591 URL: https://issues.apache.org/jira/browse/HIVE-3591 Project: Hive Issue Type: Bug Components: Authorization, CLI, Clients, JDBC Affects Versions: 0.7.1 Environment: RHEL 5.6 CDH U3 Reporter: Dev Gupta Labels: Authorization, Security The property hive.security.authorization.enabled can be set to true or false, by any user on the CLI, thus circumventing any previously set grants and authorizations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3591) set hive.security.authorization.enabled can be executed by any user
[ https://issues.apache.org/jira/browse/HIVE-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746991#comment-13746991 ] Larry McCay commented on HIVE-3591: --- Okay, so this is already resolved - correct? On Wed, Aug 21, 2013 at 7:07 PM, Thiruvel Thirumoolan (JIRA) set hive.security.authorization.enabled can be executed by any user --- Key: HIVE-3591 URL: https://issues.apache.org/jira/browse/HIVE-3591 Project: Hive Issue Type: Bug Components: Authorization, CLI, Clients, JDBC Affects Versions: 0.7.1 Environment: RHEL 5.6 CDH U3 Reporter: Dev Gupta Labels: Authorization, Security The property hive.security.authorization.enabled can be set to true or false, by any user on the CLI, thus circumventing any previously set grants and authorizations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3591) set hive.security.authorization.enabled can be executed by any user
[ https://issues.apache.org/jira/browse/HIVE-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747078#comment-13747078 ] Larry McCay commented on HIVE-3591: --- I was looking at the restrictList earlier for this. I'll look into it further. Thanks for the insight! set hive.security.authorization.enabled can be executed by any user --- Key: HIVE-3591 URL: https://issues.apache.org/jira/browse/HIVE-3591 Project: Hive Issue Type: Bug Components: Authorization, CLI, Clients, JDBC Affects Versions: 0.7.1 Environment: RHEL 5.6 CDH U3 Reporter: Dev Gupta Labels: Authorization, Security The property hive.security.authorization.enabled can be set to true or false, by any user on the CLI, thus circumventing any previously set grants and authorizations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira