[jira] [Commented] (HIVE-6329) Support column level encryption/decryption

2014-08-26 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110938#comment-14110938
 ] 

Larry McCay commented on HIVE-6329:
---

Hi [~navis] - It seems to me that we need to have some sort of AES encryption 
with proper key management and initialization capabilities in order to consider 
this ready for inclusion in Apache. If this is already being done then I may 
just be missing where it is in the patch. You made comments earlier on the jira 
about acquiring private keys over SSL - I don't see these being acquired or 
used in the patch either. 

Is there a companion jira that I need to see?

Perhaps, a high level design document would be helpful in communicating your 
intent and in teasing out all of the requirements for key management, 
initialization and usage for this feature?

 Support column level encryption/decryption
 --

 Key: HIVE-6329
 URL: https://issues.apache.org/jira/browse/HIVE-6329
 Project: Hive
  Issue Type: New Feature
  Components: Security, Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6329.1.patch.txt, HIVE-6329.10.patch.txt, 
 HIVE-6329.11.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, 
 HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, 
 HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt


 Receiving some requirements on encryption recently but hive is not supporting 
 it. Before the full implementation via HIVE-5207, this might be useful for 
 some cases.
 {noformat}
 hive create table encode_test(id int, name STRING, phone STRING, address 
 STRING) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
 STORED AS TEXTFILE;
 OK
 Time taken: 0.584 seconds
 hive insert into table encode_test select 
 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows);
 ..
 OK
 Time taken: 5.121 seconds
 hive select * from encode_test;
 OK
 100   navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
 Time taken: 0.078 seconds, Fetched: 1 row(s)
 hive 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6329) Support column level encryption/decryption

2014-08-26 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111827#comment-14111827
 ] 

Larry McCay commented on HIVE-6329:
---

I see, [~navis]. So, the intent of this patch is to provide a hook within the 
SerDe mechanism with enough fidelity to do encryption but the initial 
implementation just provides an encoding to Base64 implementation. That helps 
me understand the patch more and I think you have accomplished this.

I would be a bit leery of calling the hook and Base64 implementation that we 
are providing in this patch column level encryption/decryption - even though 
you are enabling someone to use it for that. This happens to be a patch that 
introduces column/value encoding/decoding. This is easily reversible and 
joinable across tables allowing correlations to be made.

Are we able to frame the usecase that is actually represented by this patch as 
a problem that needs solving or do we need to make this implementation more 
robust in terms of encryption/decryption and all then key management 
requirements required to do that properly?

I am just concerned about introducing new interfaces and hooks that need to be 
supported if they are not what we would consider strategic implementation 
choices for a given feature like encryption. Does the SerDe mechansim provide 
everything that we need? It seems like this approach provides little in terms 
of key management and metadata which are requisite for encryption mechanisms. 
Though, I may still be missing the forest for the trees.

What I would like to do is ensure that our customers have a path forward with 
their needs met while not moving this forward in apache until we have an actual 
encryption mechanism available.

Does that make sense?

What do you think that will require?

 Support column level encryption/decryption
 --

 Key: HIVE-6329
 URL: https://issues.apache.org/jira/browse/HIVE-6329
 Project: Hive
  Issue Type: New Feature
  Components: Security, Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6329.1.patch.txt, HIVE-6329.10.patch.txt, 
 HIVE-6329.11.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, 
 HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, 
 HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt


 Receiving some requirements on encryption recently but hive is not supporting 
 it. Before the full implementation via HIVE-5207, this might be useful for 
 some cases.
 {noformat}
 hive create table encode_test(id int, name STRING, phone STRING, address 
 STRING) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
 STORED AS TEXTFILE;
 OK
 Time taken: 0.584 seconds
 hive insert into table encode_test select 
 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows);
 ..
 OK
 Time taken: 5.121 seconds
 hive select * from encode_test;
 OK
 100   navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
 Time taken: 0.078 seconds, Fetched: 1 row(s)
 hive 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-20 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104059#comment-14104059
 ] 

Larry McCay commented on HIVE-7634:
---

Are there plans to commit this to branch-2?

 Use Configuration.getPassword() if available to eliminate passwords from 
 hive-site.xml
 --

 Key: HIVE-7634
 URL: https://issues.apache.org/jira/browse/HIVE-7634
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7634.1.patch


 HADOOP-10607 provides a Configuration.getPassword() API that allows passwords 
 to be retrieved from a configured credential provider, while also being able 
 to fall back to the HiveConf setting if no provider is set up.  Hive should 
 use this API for versions of Hadoop that support this API. This would give 
 users the ability to remove the passwords from their Hive configuration files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-20 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104078#comment-14104078
 ] 

Larry McCay commented on HIVE-7634:
---

Just realized that branch-2 is a hadoop branch.

 Use Configuration.getPassword() if available to eliminate passwords from 
 hive-site.xml
 --

 Key: HIVE-7634
 URL: https://issues.apache.org/jira/browse/HIVE-7634
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7634.1.patch


 HADOOP-10607 provides a Configuration.getPassword() API that allows passwords 
 to be retrieved from a configured credential provider, while also being able 
 to fall back to the HiveConf setting if no provider is set up.  Hive should 
 use this API for versions of Hadoop that support this API. This would give 
 users the ability to remove the passwords from their Hive configuration files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-06 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088028#comment-14088028
 ] 

Larry McCay commented on HIVE-7634:
---

Hi [~jdere] - there are similar uptake jiras being tracked on HADOOP-10904.
Those patches may be of interest to you.

 Use Configuration.getPassword() if available to eliminate passwords from 
 hive-site.xml
 --

 Key: HIVE-7634
 URL: https://issues.apache.org/jira/browse/HIVE-7634
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Jason Dere
Assignee: Jason Dere

 HADOOP-10607 provides a Configuration.getPassword() API that allows passwords 
 to be retrieved from a configured credential provider, while also being able 
 to fall back to the HiveConf setting if no provider is set up.  Hive should 
 use this API for versions of Hadoop that support this API. This would give 
 users the ability to remove the passwords from their Hive configuration files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-06 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088033#comment-14088033
 ] 

Larry McCay commented on HIVE-7634:
---

HADOOP-10904 contains similar uptake patches

 Use Configuration.getPassword() if available to eliminate passwords from 
 hive-site.xml
 --

 Key: HIVE-7634
 URL: https://issues.apache.org/jira/browse/HIVE-7634
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Jason Dere
Assignee: Jason Dere

 HADOOP-10607 provides a Configuration.getPassword() API that allows passwords 
 to be retrieved from a configured credential provider, while also being able 
 to fall back to the HiveConf setting if no provider is set up.  Hive should 
 use this API for versions of Hadoop that support this API. This would give 
 users the ability to remove the passwords from their Hive configuration files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7175) Provide password file option to beeline

2014-06-11 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027698#comment-14027698
 ] 

Larry McCay commented on HIVE-7175:
---

I just realized that this is the users' LDAP password.
It would be unfortunate to have to leave this laying around in various places 
unless absolutely necessary.

Does the beeline CLI currently allow for using the java Console to collect the 
password from the user?

I understand that for scripting type purposes we may need another collection 
mechanism but for usecases with a user and console available the users' 
passwords should not be persisted outside of the directory itself when it can 
be avoided.

For cases where it can not be avoided the side file approach is certainly 
better than on the command line itself in terms of visibility.

 Provide password file option to beeline
 ---

 Key: HIVE-7175
 URL: https://issues.apache.org/jira/browse/HIVE-7175
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.13.0
Reporter: Robert Justice
Assignee: Dr. Wendell Urth
  Labels: features, security
 Attachments: HIVE-7175.patch


 For people connecting to Hive Server 2 with LDAP authentication enabled, in 
 order to batch run commands, we currently have to provide the password openly 
 in the command line.   They could use some expect scripting, but I think a 
 valid improvement would be to provide a password file option similar to other 
 CLI commands in hadoop (e.g. sqoop) to be more secure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7175) Provide password file option to beeline

2014-06-04 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017823#comment-14017823
 ] 

Larry McCay commented on HIVE-7175:
---

Hi [~rjustice] - we may want to consider the use of the CredentialProvider API 
that will be committed soon.
See HADOOP-10607. This isn't mutually exclusive with the password file approach 
as there are plans to fallback to existing password files in certain 
components. However, the abstraction of the API is best realized through the 
new Configuration.getPassword(String name) method. This will allow you to ask 
for a configuration item that you know is a password and it will check for an 
aliased credential based on the name through the CredentialProvider API. If the 
name is not resolved into a credential from a provider then it falls back to 
the config file.

The extra hop of the separate file isn't a problem but it isn't encapsulated by 
the getPassword method going into Configuration.

Just something to keep in mind.

 Provide password file option to beeline
 ---

 Key: HIVE-7175
 URL: https://issues.apache.org/jira/browse/HIVE-7175
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.13.0
Reporter: Robert Justice
  Labels: features, security

 For people connecting to Hive Server 2 with LDAP authentication enabled, in 
 order to batch run commands, we currently have to provide the password openly 
 in the command line.   They could use some expect scripting, but I think a 
 valid improvement would be to provide a password file option similar to other 
 CLI commands in hadoop (e.g. sqoop) to be more secure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7175) Provide password file option to beeline

2014-06-04 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017866#comment-14017866
 ] 

Larry McCay commented on HIVE-7175:
---

These should be seen as complementary issues.

 Provide password file option to beeline
 ---

 Key: HIVE-7175
 URL: https://issues.apache.org/jira/browse/HIVE-7175
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.13.0
Reporter: Robert Justice
  Labels: features, security

 For people connecting to Hive Server 2 with LDAP authentication enabled, in 
 order to batch run commands, we currently have to provide the password openly 
 in the command line.   They could use some expect scripting, but I think a 
 valid improvement would be to provide a password file option similar to other 
 CLI commands in hadoop (e.g. sqoop) to be more secure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7010) templeton/v1/queue REST method has been removed

2014-05-05 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989795#comment-13989795
 ] 

Larry McCay commented on HIVE-7010:
---

H... so this change is backwards compatible then? What happens when a 
client calls the removed API?

 templeton/v1/queue REST method has been removed
 ---

 Key: HIVE-7010
 URL: https://issues.apache.org/jira/browse/HIVE-7010
 Project: Hive
  Issue Type: Bug
  Components: Documentation, WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Lefty Leverenz

 deprecated queue REST method was removed from WebHCat in HIVE-6432.  jobs 
 is the replacement.
 https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference needs to 
 be updated



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7010) templeton/v1/queue REST method has been removed

2014-05-05 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989843#comment-13989843
 ] 

Larry McCay commented on HIVE-7010:
---

[~ekoifman] - thanks for the insight. I am still getting my head around how 
Knox will handle contract changes across the entire ecosystem. I don't think 
that asking you to revert it will help that in general but getting some sense 
of how the APIs evolve will inform how we should track those changes. It seems 
that that we will need to support component versioning rather than API 
versioning. If this is how APIs evolve across the other components as well then 
the version indicator in the URL doesn't seem very meaningful to me.

 templeton/v1/queue REST method has been removed
 ---

 Key: HIVE-7010
 URL: https://issues.apache.org/jira/browse/HIVE-7010
 Project: Hive
  Issue Type: Bug
  Components: Documentation, WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Lefty Leverenz

 deprecated queue REST method was removed from WebHCat in HIVE-6432.  jobs 
 is the replacement.
 https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference needs to 
 be updated



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7010) templeton/v1/queue REST method has been removed

2014-05-05 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989886#comment-13989886
 ] 

Larry McCay commented on HIVE-7010:
---

That was the behavior that I expected actually.
Getting all components to adhere to this would like be difficult but I think 
that it makes sense for all components that do support the version indicator to 
take that approach. Some components don't actually even have a version 
indicator in their APIs.

 templeton/v1/queue REST method has been removed
 ---

 Key: HIVE-7010
 URL: https://issues.apache.org/jira/browse/HIVE-7010
 Project: Hive
  Issue Type: Bug
  Components: Documentation, WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Lefty Leverenz

 deprecated queue REST method was removed from WebHCat in HIVE-6432.  jobs 
 is the replacement.
 https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference needs to 
 be updated



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7010) templeton/v1/queue REST method has been removed

2014-05-04 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989202#comment-13989202
 ] 

Larry McCay commented on HIVE-7010:
---

Hi [~leftylev] - I am curious whether the v1 is going to be bumped up to v2 in 
response to the contract change. I believe that it is easier for consuming 
projects - such as Apache Knox - to be able to multiplex across expected APIs 
with specific version indicators.


 templeton/v1/queue REST method has been removed
 ---

 Key: HIVE-7010
 URL: https://issues.apache.org/jira/browse/HIVE-7010
 Project: Hive
  Issue Type: Bug
  Components: Documentation, WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Lefty Leverenz

 deprecated queue REST method was removed from WebHCat in HIVE-6432.  jobs 
 is the replacement.
 https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference needs to 
 be updated



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6329) Support column level encryption/decryption

2014-02-11 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898061#comment-13898061
 ] 

Larry McCay commented on HIVE-6329:
---

[~owen.omalley] I certainly agree that we need the IV but I'm not sure that I 
like it in the DDL.


 Support column level encryption/decryption
 --

 Key: HIVE-6329
 URL: https://issues.apache.org/jira/browse/HIVE-6329
 Project: Hive
  Issue Type: New Feature
  Components: Security, Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, 
 HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt


 Receiving some requirements on encryption recently but hive is not supporting 
 it. Before the full implementation via HIVE-5207, this might be useful for 
 some cases.
 {noformat}
 hive create table encode_test(id int, name STRING, phone STRING, address 
 STRING) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  WITH SERDEPROPERTIES ('column.encode.indices'='2,3', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
 STORED AS TEXTFILE;
 OK
 Time taken: 0.584 seconds
 hive insert into table encode_test select 
 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows);
 ..
 OK
 Time taken: 5.121 seconds
 hive select * from encode_test;
 OK
 100   navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
 Time taken: 0.078 seconds, Fetched: 1 row(s)
 hive 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6329) Support column level encryption/decryption

2014-02-11 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898092#comment-13898092
 ] 

Larry McCay commented on HIVE-6329:
---

That works.
We would need to be able to determine that a particular row has no IV as well.
This could be done by some well known constant or the size of the IV?
A size of 0 would indicate that there is no encryption in the row.


 Support column level encryption/decryption
 --

 Key: HIVE-6329
 URL: https://issues.apache.org/jira/browse/HIVE-6329
 Project: Hive
  Issue Type: New Feature
  Components: Security, Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, 
 HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt


 Receiving some requirements on encryption recently but hive is not supporting 
 it. Before the full implementation via HIVE-5207, this might be useful for 
 some cases.
 {noformat}
 hive create table encode_test(id int, name STRING, phone STRING, address 
 STRING) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  WITH SERDEPROPERTIES ('column.encode.indices'='2,3', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
 STORED AS TEXTFILE;
 OK
 Time taken: 0.584 seconds
 hive insert into table encode_test select 
 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows);
 ..
 OK
 Time taken: 5.121 seconds
 hive select * from encode_test;
 OK
 100   navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
 Time taken: 0.078 seconds, Fetched: 1 row(s)
 hive 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6329) Support column level encryption/decryption

2014-02-11 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898154#comment-13898154
 ] 

Larry McCay commented on HIVE-6329:
---

I think it is important to understand that there is a separate key per col
- therefore you wouldn't have the same cipher text for same clear text.





 Support column level encryption/decryption
 --

 Key: HIVE-6329
 URL: https://issues.apache.org/jira/browse/HIVE-6329
 Project: Hive
  Issue Type: New Feature
  Components: Security, Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, 
 HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt


 Receiving some requirements on encryption recently but hive is not supporting 
 it. Before the full implementation via HIVE-5207, this might be useful for 
 some cases.
 {noformat}
 hive create table encode_test(id int, name STRING, phone STRING, address 
 STRING) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  WITH SERDEPROPERTIES ('column.encode.indices'='2,3', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
 STORED AS TEXTFILE;
 OK
 Time taken: 0.584 seconds
 hive insert into table encode_test select 
 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows);
 ..
 OK
 Time taken: 5.121 seconds
 hive select * from encode_test;
 OK
 100   navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
 Time taken: 0.078 seconds, Fetched: 1 row(s)
 hive 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5207) Support data encryption for Hive tables

2013-09-25 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777416#comment-13777416
 ] 

Larry McCay commented on HIVE-5207:
---

Hi Jerry - I have taken a high level look through the patch. Lots of good stuff 
there - good work! A couple things that I would like to see more javadocs on 
and perhaps a document that describe the usecases:

1. TwoTieredKey - exactly the purpose, how it's used what the tiers are, etc
2. External KeyManagement integration - where and what is the expected contract 
for this integration
3. A specific usecase description for exporting keys into an external keystore 
and who has the authority to initiate the export and where the password comes 
from
4. An explanation as to why we should ever store the key with the data which 
seems like a bad idea. I understand that it is encrypted with the master secret 
- which takes me to the next question. :)
5. Where is the master secret established and stored and how is it protected

There is a minor typo/spelling error that you probably want to fix now rather 
than later:

+public interface HiveKeyResolver  {
+  void init(Configuration conf) throws CryptoException;
+
+  /**
+   * Resolve the key meta information of a table
+   * @param tableDesc The table descriptor
+   */
+  KeyMeta resovleKey(TableDesc tableDesc);
+}

change resovleKey to resolveKey here and in the interface implementation and 
consumer of the method - I think there were 3 instances.

Again, nice work here!
Let's get some higher level descriptions in code javadocs and/or separate 
documents.
Thanks!


 Support data encryption for Hive tables
 ---

 Key: HIVE-5207
 URL: https://issues.apache.org/jira/browse/HIVE-5207
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.12.0
Reporter: Jerry Chen
  Labels: Rhino
 Attachments: HIVE-5207.patch

   Original Estimate: 504h
  Remaining Estimate: 504h

 For sensitive and legally protected data such as personal information, it is 
 a common practice that the data is stored encrypted in the file system. To 
 enable Hive with the ability to store and query the encrypted data is very 
 crucial for Hive data analysis in enterprise. 
  
 When creating table, user can specify whether a table is an encrypted table 
 or not by specify a property in TBLPROPERTIES. Once an encrypted table is 
 created, query on the encrypted table is transparent as long as the 
 corresponding key management facilities are set in the running environment of 
 query. We can use hadoop crypto provided by HADOOP-9331 for underlying data 
 encryption and decryption. 
  
 As to key management, we would support several common key management use 
 cases. First, the table key (data key) can be stored in the Hive metastore 
 associated with the table in properties. The table key can be explicit 
 specified or auto generated and will be encrypted with a master key. There 
 are cases that the data being processed is generated by other applications, 
 we need to support externally managed or imported table keys. Also, the data 
 generated by Hive may be consumed by other applications in the system. We 
 need to a tool or command for exporting the table key to a java keystore for 
 using externally.
  
 To handle versions of Hadoop that do not have crypto support, we can avoid 
 compilation problems by segregating crypto API usage into separate files 
 (shims) to be included only if a flag is defined on the Ant command line 
 (something like –Dcrypto=true).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4227) Add column level encryption to ORC files

2013-09-05 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759139#comment-13759139
 ] 

Larry McCay commented on HIVE-4227:
---

I am in the process of reworking the patch for HADOOP-9534 Credential 
Management Framework in order to support accessing keying material for this 
issue. Current thinking is that CMF can abstract the source of keys and be 
leveraged across a number of different crypto and password protection usecases 
in the Hadoop ecosystem. This is why it is being done in Hadoop rather than 
Hive. We will want to also align it's use with HADOOP-9331 - since 9331 will be 
leveraged in here as well as for the cryptoFS, etc.

Will provide a description of the DDL/metastore and column store changes that 
will be needed to support the column level encryption once I have it written up.

 Add column level encryption to ORC files
 

 Key: HIVE-4227
 URL: https://issues.apache.org/jira/browse/HIVE-4227
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
  Labels: gsoc, gsoc2013

 It would be useful to support column level encryption in ORC files. Since 
 each column and its associated index is stored separately, encrypting a 
 column separately isn't difficult. In terms of key distribution, it would 
 make sense to use an external server like the one in HADOOP-9331.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5207) Support data encryption for Hive tables

2013-09-03 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757433#comment-13757433
 ] 

Larry McCay commented on HIVE-5207:
---

This seems to be a duplicate of HIVE-4227. I am actually in the process of 
working on that functionality and plan to leverage HADOOP-9331 as appropriate. 
We will need to rationalize these Jiras. Maybe you calling out the difference 
between the Jiras as the entire table being encrypted here rather than the 
individual columns in 4227? I think that if we need both levels of granularity 
that they need to be based on the same solution.

The key management aspect is one that we will need to sync on. The patch in 
HADOOP-9534 (CMF) is being refactored in order to support our API needs for 
acquiring keys for Hive encryption and presumably for CryptoFS. Generally 
speaking, the nonce/iv, alias and version indicator will be stored within the 
colstore in Hive for decryption. That is the current thinking anyway.

Support for multiple key revisions per alias will allow for rotation and 
rolling of keys within the datastores.

CMF will provide pluggability for talking to key management/data protection 
providers: initially a JCEKS keystore and eventually a central key 
management/data protection service for Hadoop. The central service will also 
provide pluggability for integrating third party providers/solutions.

TableProperties is one way to indicate the need for data protection - we are 
looking at others as well - but of course I am currently looking at column 
level indicators too.

Let's figure out how to combine or consolidate these Jiras so that we can 
hopefully get a coherent set of patches to collaborate with in a branch.

 Support data encryption for Hive tables
 ---

 Key: HIVE-5207
 URL: https://issues.apache.org/jira/browse/HIVE-5207
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.12.0
Reporter: Jerry Chen
  Labels: Rhino
   Original Estimate: 504h
  Remaining Estimate: 504h

 For sensitive and legally protected data such as personal information, it is 
 a common practice that the data is stored encrypted in the file system. To 
 enable Hive with the ability to store and query the encrypted data is very 
 crucial for Hive data analysis in enterprise. 
  
 When creating table, user can specify whether a table is an encrypted table 
 or not by specify a property in TBLPROPERTIES. Once an encrypted table is 
 created, query on the encrypted table is transparent as long as the 
 corresponding key management facilities are set in the running environment of 
 query. We can use hadoop crypto provided by HADOOP-9331 for underlying data 
 encryption and decryption. 
  
 As to key management, we would support several common key management use 
 cases. First, the table key (data key) can be stored in the Hive metastore 
 associated with the table in properties. The table key can be explicit 
 specified or auto generated and will be encrypted with a master key. There 
 are cases that the data being processed is generated by other applications, 
 we need to support externally managed or imported table keys. Also, the data 
 generated by Hive may be consumed by other applications in the system. We 
 need to a tool or command for exporting the table key to a java keystore for 
 using externally.
  
 To handle versions of Hadoop that do not have crypto support, we can avoid 
 compilation problems by segregating crypto API usage into separate files 
 (shims) to be included only if a flag is defined on the Ant command line 
 (something like –Dcrypto=true).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3591) set hive.security.authorization.enabled can be executed by any user

2013-08-22 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747571#comment-13747571
 ] 

Larry McCay commented on HIVE-3591:
---

It appears that System properties can override conf vars too. I assume that we 
should leverage the restrictList there as well.

 set hive.security.authorization.enabled can be executed by any user
 ---

 Key: HIVE-3591
 URL: https://issues.apache.org/jira/browse/HIVE-3591
 Project: Hive
  Issue Type: Bug
  Components: Authorization, CLI, Clients, JDBC
Affects Versions: 0.7.1
 Environment: RHEL 5.6
 CDH U3
Reporter: Dev Gupta
  Labels: Authorization, Security

 The property hive.security.authorization.enabled can be set to true or false, 
 by any user on the CLI, thus circumventing any previously set grants and 
 authorizations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3591) set hive.security.authorization.enabled can be executed by any user

2013-08-21 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746744#comment-13746744
 ] 

Larry McCay commented on HIVE-3591:
---

What is the current status/thinking on this issue? Is it something that we 
should be addressing and are there any thoughts on how it should be 
prevented/restricted, etc?

 set hive.security.authorization.enabled can be executed by any user
 ---

 Key: HIVE-3591
 URL: https://issues.apache.org/jira/browse/HIVE-3591
 Project: Hive
  Issue Type: Bug
  Components: Authorization, CLI, Clients, JDBC
Affects Versions: 0.7.1
 Environment: RHEL 5.6
 CDH U3
Reporter: Dev Gupta
  Labels: Authorization, Security

 The property hive.security.authorization.enabled can be set to true or false, 
 by any user on the CLI, thus circumventing any previously set grants and 
 authorizations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3591) set hive.security.authorization.enabled can be executed by any user

2013-08-21 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746991#comment-13746991
 ] 

Larry McCay commented on HIVE-3591:
---

Okay, so this is already resolved - correct?


On Wed, Aug 21, 2013 at 7:07 PM, Thiruvel Thirumoolan (JIRA) 



 set hive.security.authorization.enabled can be executed by any user
 ---

 Key: HIVE-3591
 URL: https://issues.apache.org/jira/browse/HIVE-3591
 Project: Hive
  Issue Type: Bug
  Components: Authorization, CLI, Clients, JDBC
Affects Versions: 0.7.1
 Environment: RHEL 5.6
 CDH U3
Reporter: Dev Gupta
  Labels: Authorization, Security

 The property hive.security.authorization.enabled can be set to true or false, 
 by any user on the CLI, thus circumventing any previously set grants and 
 authorizations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3591) set hive.security.authorization.enabled can be executed by any user

2013-08-21 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747078#comment-13747078
 ] 

Larry McCay commented on HIVE-3591:
---

I was looking at the restrictList earlier for this. I'll look into it further. 
Thanks for the insight!

 set hive.security.authorization.enabled can be executed by any user
 ---

 Key: HIVE-3591
 URL: https://issues.apache.org/jira/browse/HIVE-3591
 Project: Hive
  Issue Type: Bug
  Components: Authorization, CLI, Clients, JDBC
Affects Versions: 0.7.1
 Environment: RHEL 5.6
 CDH U3
Reporter: Dev Gupta
  Labels: Authorization, Security

 The property hive.security.authorization.enabled can be set to true or false, 
 by any user on the CLI, thus circumventing any previously set grants and 
 authorizations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira