[jira] [Created] (HIVE-16936) wrong result with CTAS(create table as select)

2017-06-21 Thread Xiaomeng Huang (JIRA)
Xiaomeng Huang created HIVE-16936:
-

 Summary: wrong result with CTAS(create table as select)
 Key: HIVE-16936
 URL: https://issues.apache.org/jira/browse/HIVE-16936
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Xiaomeng Huang
Priority: Critical


1. 
{code}
hive> select 'test' as did from abc_test_old
> where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1;
OK
test  
{code}
result is 'test'

2. 
{code}
hive> create table abc_test_12345 as
> select 'test' as did from abc_test_old
> where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1;

hive> select did from abc_test_12345 limit 1;
OK
5FCAFD34-C124-4E13-AF65-27B675C945CC 
{code}
result is '5FCAFD34-C124-4E13-AF65-27B675C945CC'
why result is not 'test'?

3. 
{code}
hive> explain
> create table abc_test_12345 as
> select 'test' as did from abc_test_old
> where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, Stage-4
  Stage-3
  Stage-0 depends on stages: Stage-3, Stage-2, Stage-5
  Stage-7 depends on stages: Stage-0
  Stage-2
  Stage-4
  Stage-5 depends on stages: Stage-4

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abc_test_old
Statistics: Num rows: 32 Data size: 1152 Basic stats: COMPLETE 
Column stats: NONE
Filter Operator
  predicate: (did = '5FCAFD34-C124-4E13-AF65-27B675C945CC') (type: 
boolean)
  Statistics: Num rows: 16 Data size: 576 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
Statistics: Num rows: 16 Data size: 576 Basic stats: COMPLETE 
Column stats: NONE
Limit
  Number of rows: 1
  Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Select Operator
  expressions: '5FCAFD34-C124-4E13-AF65-27B675C945CC' (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Limit
Number of rows: 1
Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: true
  Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
  serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
  name: default.abc_test_12345
..
{code}
why expressions is '5FCAFD34-C124-4E13-AF65-27B675C945CC'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-15836) CATS failed when the table is stored as orc and select cause has null

2017-02-07 Thread Xiaomeng Huang (JIRA)
Xiaomeng Huang created HIVE-15836:
-

 Summary: CATS failed when the table is stored as orc and select 
cause has null
 Key: HIVE-15836
 URL: https://issues.apache.org/jira/browse/HIVE-15836
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Xiaomeng Huang


Based on the stable version 1.2.1, I patched 
https://issues.apache.org/jira/browse/HIVE-11217, but I still got error.

CASE:
{quote}
CREATE TABLE empty (x int);
CREATE TABLE orc_table_with_null 
STORED AS ORC 
AS 
SELECT 
x,
null
FROM empty;
{quote}

ERROR:
{quote}
FAILED: SemanticException [Error 10305]: CREATE-TABLE-AS-SELECT creates a VOID 
type, please use CAST to specify the type, near field:  _c1
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-14369) submit a task with hive on spark to other yarn cluster failed

2016-07-28 Thread Xiaomeng Huang (JIRA)
Xiaomeng Huang created HIVE-14369:
-

 Summary: submit a task with hive on spark to other yarn cluster 
failed
 Key: HIVE-14369
 URL: https://issues.apache.org/jira/browse/HIVE-14369
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang


In our environment, we have two hadoop clusters with HA,  named hivecluster and 
sparkcluster.
hivecluster is a HA hadoop cluster for hive, which has large hard disk.
sparkcluster is a HA hadoop cluster for spark, which has large memory.
e.g. below is a hdfs-site.xml of hivecluster:
{code}

dfs.ha.namenodes.hivecluster
nn1,nn2


dfs.namenode.rpc-address.hivecluster.nn1
10.17.21.32:9000


dfs.namenode.rpc-address.hivecluster.nn2
10.17.21.77:9000


dfs.namenode.http-address.hivecluster.nn1
10.17.21.32:50070


dfs.namenode.http-address.hivecluster.nn2
10.17.21.77:50070

{code}

Firstly, I created a hive table located as hdfs://hivecluster/hive/warehouse/xxx
If I use hive on mr, it will run successfully.
But if I use hive on spark to submit a task to yarn cluster of sparkcluster, it 
says:
{code}
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask
yarn日志显示:
Diagnostics: java.lang.IllegalArgumentException: java.net.UnknownHostException: 
hivecluster
Failing this attempt. Failing the application.
{code}

I didn't set host of hivecluster into hdfs-site.xml of sparkcluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7934) Improve column level encryption with key management

2014-12-10 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240863#comment-14240863
 ] 

Xiaomeng Huang commented on HIVE-7934:
--

Hi [~chirag.aggarwal]
I have updated my patch on HIVE-8049 using crypto codec and kms in 
hadoop-common.

 Improve column level encryption with key management
 ---

 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor

 Now HIVE-6329 is a framework of column level encryption/decryption. But the 
 implementation in HIVE-6329 is just use Base64, it is not safe and have some 
 problems:
 - Base64WriteOnly just be able to get the ciphertext from client for any 
 users. 
 - Base64Rewriter just be able to get plaintext from client for any users.
 I have an improvement based on HIVE-6329 using key management via kms.
 This patch implement transparent column level encryption. Users don't need to 
 set anything when they quey tables.
 # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get 
 key)
 {code}
  property
 namehadoop.kms.acl.GET/name
 valueuser1 root/value
 description
   ACL for get-key-version and get-current-key operations.
 /description
   /property
 {code}
 # set hive-site.xml 
 {code}
  property  
 namehadoop.security.key.provider.path/name  
 valuekms://http@localhost:16000/kms/value  
  /property 
 {code}
 # create an encrypted table
 {code}
 drop table student_column_encrypt;
 create table student_column_encrypt (s_key INT, s_name STRING, s_country 
 STRING, s_age INT) ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter')
  
   STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1');
 insert overwrite table student_column_encrypt 
 select 
   s_key, s_name, s_country, s_age
 from student;
  
 select * from student_column_encrypt; 
 {code}
 # query table by different user, this is transparent to users. It is very 
 convenient and don't need to set anything.
 {code}
 [root@huang1 hive_data]# hive
 hive select * from student_column_encrypt;   
 OK
 0 Armon   China   20
 1 JackUSA 21
 2 LucyEngland 22
 3 LilyFrance  23
 4 Yom Spain   24
 Time taken: 0.759 seconds, Fetched: 5 row(s)
 [root@huang1 hive_data]# su user2
 [user2@huang1 hive_data]$ hive
 hive select * from student_column_encrypt;
 OK
 0 Armon   dqyb188=NULL
 1 JackYJez   NULL
 2 LucycKqV1c8MTw==NULL
 3 Lilyc7aT180H   NULL
 4 Yom ZrST0MA=NULL
 Time taken: 0.77 seconds, Fetched: 5 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8969) getChildPrivileges should in one transaction with revoke

2014-12-10 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang resolved HIVE-8969.
--
Resolution: Invalid

Sorry, I want to create this jira in Sentry project, but I made a mistake in 
Hive.
Close this jira as Invalid.

 getChildPrivileges should in one transaction with revoke
 

 Key: HIVE-8969
 URL: https://issues.apache.org/jira/browse/HIVE-8969
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Critical





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8049) Transparent column level encryption using kms

2014-12-10 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8049:
-
Status: Patch Available  (was: In Progress)

 Transparent column level encryption using kms
 -

 Key: HIVE-8049
 URL: https://issues.apache.org/jira/browse/HIVE-8049
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8049.001.patch, HIVE-8049.002.patch


 This patch implement transparent column level encryption. Users don't need to 
 set anything when they quey tables.
 # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get 
 key)
 {code}
  property
 namehadoop.kms.acl.GET/name
 valueuser1 root/value
 description
   ACL for get-key-version and get-current-key operations.
 /description
   /property
 {code}
 # set hive-site.xml 
 {code}
  property  
 namehadoop.security.key.provider.path/name  
 valuekms://http@localhost:16000/kms/value  
  /property 
 {code}
 # create an encrypted table
 {code}
 drop table student_column_encrypt;
 create table student_column_encrypt (s_key INT, s_name STRING, s_country 
 STRING, s_age INT) ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter')
  
   STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1');
 insert overwrite table student_column_encrypt 
 select 
   s_key, s_name, s_country, s_age
 from student;
  
 select * from student_column_encrypt; 
 {code}
 # query table by different user, this is transparent to users. It is very 
 convenient and don't need to set anything.
 {code}
 [root@huang1 hive_data]# hive
 hive select * from student_column_encrypt;   
 OK
 0 Armon   China   20
 1 JackUSA 21
 2 LucyEngland 22
 3 LilyFrance  23
 4 Yom Spain   24
 Time taken: 0.759 seconds, Fetched: 5 row(s)
 [root@huang1 hive_data]# su user2
 [user2@huang1 hive_data]$ hive
 hive select * from student_column_encrypt;
 OK
 0 Armon   dqyb188=NULL
 1 JackYJez   NULL
 2 LucycKqV1c8MTw==NULL
 3 Lilyc7aT180H   NULL
 4 Yom ZrST0MA=NULL
 Time taken: 0.77 seconds, Fetched: 5 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8049) Transparent column level encryption using kms

2014-12-09 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8049:
-
Description: 
This patch implement transparent column level encryption. Users don't need to 
set anything when they quey tables.
# setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key)
{code}
 property
namehadoop.kms.acl.GET/name
valueuser1 root/value
description
  ACL for get-key-version and get-current-key operations.
/description
  /property
{code}
# set hive-site.xml 
{code}
 property  
namehadoop.security.key.provider.path/name  
valuekms://http@localhost:16000/kms/value  
 /property 
{code}
# create an encrypted table
{code}
drop table student_column_encrypt;
create table student_column_encrypt (s_key INT, s_name STRING, s_country 
STRING, s_age INT) ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter')
 
  STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1');
insert overwrite table student_column_encrypt 
select 
  s_key, s_name, s_country, s_age
from student;
 
select * from student_column_encrypt; 
{code}
# query table by different user, this is transparent to users. It is very 
convenient and don't need to set anything.
{code}
[root@huang1 hive_data]# hive
hive select * from student_column_encrypt;   
OK
0   Armon   China   20
1   JackUSA 21
2   LucyEngland 22
3   LilyFrance  23
4   Yom Spain   24
Time taken: 0.759 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user2
[user2@huang1 hive_data]$ hive
hive select * from student_column_encrypt;
OK
0   Armon   dqyb188=NULL
1   JackYJez   NULL
2   LucycKqV1c8MTw==NULL
3   Lilyc7aT180H   NULL
4   Yom ZrST0MA=NULL
Time taken: 0.77 seconds, Fetched: 5 row(s)
{code}

  was:
This patch implement transparent column level encryption. Users don't need to 
set anything when they quey tables.
# setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key)
{code}
 property
namehadoop.kms.acl.GET/name
valueuser1 root/value
description
  ACL for get-key-version and get-current-key operations.
/description
  /property
{code}
# set hive-site.xml 
{code}
 property  
namehadoop.security.kms.uri/name  
valuehttp://localhost:16000/kms/value  
 /property 
{code}
# create an encrypted table
{code}
-- region-aes-column.q
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter')
  STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1);
insert overwrite table region_aes_column
select
  r_regionkey, r_name
from region;
{code}
# query table by different user, this is transparent to users. It is very 
convenient and don't need to set anything.
{code}
[root@huang1 hive_data]# hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.9 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user1
[user1@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.899 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user2
[user2@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   RcQycWVD
1   Rc8lam9Bxg==
2   RdEpeQ==
3   Qdcyd3ZH
4   ScskfGpHp8KIIuY=
Time taken: 0.749 seconds, Fetched: 5 row(s)
{code}


 Transparent column level encryption using kms
 -

 Key: HIVE-8049
 URL: https://issues.apache.org/jira/browse/HIVE-8049
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8049.001.patch


 This patch implement transparent column level encryption. Users don't need to 
 set anything when they quey tables.
 # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get 
 key)
 {code}
  property
 namehadoop.kms.acl.GET/name
 valueuser1 root/value
 description
   ACL for get-key-version and get-current-key operations.
 /description
   /property
 {code}
 # set hive-site.xml 
 {code}
  property  
 namehadoop.security.key.provider.path/name  
 valuekms://http@localhost:16000/kms/value  
  /property 
 {code}
 # create an encrypted table
 {code}
 drop table student_column_encrypt;
 create table 

[jira] [Updated] (HIVE-8049) Transparent column level encryption using kms

2014-12-09 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8049:
-
Attachment: HIVE-8049.002.patch

Refactor the patch according to crypto codec in hadoop-common.

 Transparent column level encryption using kms
 -

 Key: HIVE-8049
 URL: https://issues.apache.org/jira/browse/HIVE-8049
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8049.001.patch, HIVE-8049.002.patch


 This patch implement transparent column level encryption. Users don't need to 
 set anything when they quey tables.
 # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get 
 key)
 {code}
  property
 namehadoop.kms.acl.GET/name
 valueuser1 root/value
 description
   ACL for get-key-version and get-current-key operations.
 /description
   /property
 {code}
 # set hive-site.xml 
 {code}
  property  
 namehadoop.security.key.provider.path/name  
 valuekms://http@localhost:16000/kms/value  
  /property 
 {code}
 # create an encrypted table
 {code}
 drop table student_column_encrypt;
 create table student_column_encrypt (s_key INT, s_name STRING, s_country 
 STRING, s_age INT) ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter')
  
   STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1');
 insert overwrite table student_column_encrypt 
 select 
   s_key, s_name, s_country, s_age
 from student;
  
 select * from student_column_encrypt; 
 {code}
 # query table by different user, this is transparent to users. It is very 
 convenient and don't need to set anything.
 {code}
 [root@huang1 hive_data]# hive
 hive select * from student_column_encrypt;   
 OK
 0 Armon   China   20
 1 JackUSA 21
 2 LucyEngland 22
 3 LilyFrance  23
 4 Yom Spain   24
 Time taken: 0.759 seconds, Fetched: 5 row(s)
 [root@huang1 hive_data]# su user2
 [user2@huang1 hive_data]$ hive
 hive select * from student_column_encrypt;
 OK
 0 Armon   dqyb188=NULL
 1 JackYJez   NULL
 2 LucycKqV1c8MTw==NULL
 3 Lilyc7aT180H   NULL
 4 Yom ZrST0MA=NULL
 Time taken: 0.77 seconds, Fetched: 5 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8416) Generic key management framework

2014-12-09 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8416:
-
Attachment: HIVE-8416.001.patch

 Generic key management framework
 

 Key: HIVE-8416
 URL: https://issues.apache.org/jira/browse/HIVE-8416
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8416.001.patch


 In this patch, it will includes the interfaces of keyProvider, and the 
 default implementation using java KeyStore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7934) Improve column level encryption with key management

2014-12-09 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7934:
-
Description: 
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
- Base64WriteOnly just be able to get the ciphertext from client for any users. 
- Base64Rewriter just be able to get plaintext from client for any users.

I have an improvement based on HIVE-6329 using key management via kms.
This patch implement transparent column level encryption. Users don't need to 
set anything when they quey tables.
# setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key)
{code}
 property
namehadoop.kms.acl.GET/name
valueuser1 root/value
description
  ACL for get-key-version and get-current-key operations.
/description
  /property
{code}
# set hive-site.xml 
{code}
 property  
namehadoop.security.key.provider.path/name  
valuekms://http@localhost:16000/kms/value  
 /property 
{code}
# create an encrypted table
{code}
drop table student_column_encrypt;
create table student_column_encrypt (s_key INT, s_name STRING, s_country 
STRING, s_age INT) ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter')
 
  STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1');
insert overwrite table student_column_encrypt 
select 
  s_key, s_name, s_country, s_age
from student;
 
select * from student_column_encrypt; 
{code}
# query table by different user, this is transparent to users. It is very 
convenient and don't need to set anything.
{code}
[root@huang1 hive_data]# hive
hive select * from student_column_encrypt;   
OK
0   Armon   China   20
1   JackUSA 21
2   LucyEngland 22
3   LilyFrance  23
4   Yom Spain   24
Time taken: 0.759 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user2
[user2@huang1 hive_data]$ hive
hive select * from student_column_encrypt;
OK
0   Armon   dqyb188=NULL
1   JackYJez   NULL
2   LucycKqV1c8MTw==NULL
3   Lilyc7aT180H   NULL
4   Yom ZrST0MA=NULL
Time taken: 0.77 seconds, Fetched: 5 row(s)
{code}

  was:
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
- Base64WriteOnly just be able to get the ciphertext from client for any users. 
- Base64Rewriter just be able to get plaintext from client for any users.

I have an improvement based on HIVE-6329 using key management via kms.
# setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key)
{code}
 property
namehadoop.kms.acl.GET/name
valueuser1 root/value
description
  ACL for get-key-version and get-current-key operations.
/description
  /property
{code}
# set hive-site.xml 
{code}
 property  
namehadoop.security.kms.uri/name  
valuehttp://localhost:16000/kms/value  
 /property 
{code}
# create an encrypted table
{code}
-- region-aes-column.q
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter')
  STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1);
insert overwrite table region_aes_column
select
  r_regionkey, r_name
from region;
{code}
# query table by different user, this is transparent to users. It is very 
convenient and don't need to set anything.
{code}
[root@huang1 hive_data]# hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.9 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user1
[user1@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.899 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user2
[user2@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   RcQycWVD
1   Rc8lam9Bxg==
2   RdEpeQ==
3   Qdcyd3ZH
4   ScskfGpHp8KIIuY=
Time taken: 0.749 seconds, Fetched: 5 row(s)
{code}


 Improve column level encryption with key management
 ---

 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor


[jira] [Updated] (HIVE-8252) Generic cryptographic codec

2014-12-09 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8252:
-
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

This jira is much same with the crypto codec in hadoop-common. As hadoop 2.6.0 
is released, so mark this jira as duplicated.

 Generic cryptographic codec
 ---

 Key: HIVE-8252
 URL: https://issues.apache.org/jira/browse/HIVE-8252
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8252.001.patch, HIVE-8252.002.patch


 In this patch, it will includes interfaces or abstract classes of generic 
 Key, CryptoCodec, Encryptor and Decryptor. And the JCE AES implementation of 
 the interfaces and abstract classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8416) Generic key management framework

2014-12-09 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang resolved HIVE-8416.
--
Resolution: Duplicate

As KMS feature has released in hadoop 2.6.0, so mark this jira as duplicated.

 Generic key management framework
 

 Key: HIVE-8416
 URL: https://issues.apache.org/jira/browse/HIVE-8416
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8416.001.patch


 In this patch, it will includes the interfaces of keyProvider, and the 
 default implementation using java KeyStore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7934) Improve column level encryption with key management

2014-12-08 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237779#comment-14237779
 ] 

Xiaomeng Huang commented on HIVE-7934:
--

Hi [~chirag.aggarwal]
When I worked on this feature, KMS feature is not released in hadoop 2.5.X. So 
I decided to write a generic crypto codec and key management in Hive. But 
talked with some committer who familiar with Security in hadoop-common, it is a 
few duplicated with the crypto codec in hadoop-common. I just have a look at 
hadoop release notes, hadoop 2.6.0 seems like include KMS feature. HIVE-8049 is 
the initial patch to implement Hive Column Level Encrpytion based on KMS in 
hadoop-common. And HIVE-8252 and HIVE-8416 will be closed as duplicated. I will 
update the patch of HIVE-8049 these days. Thanks for watching!

 Improve column level encryption with key management
 ---

 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor

 Now HIVE-6329 is a framework of column level encryption/decryption. But the 
 implementation in HIVE-6329 is just use Base64, it is not safe and have some 
 problems:
 - Base64WriteOnly just be able to get the ciphertext from client for any 
 users. 
 - Base64Rewriter just be able to get plaintext from client for any users.
 I have an improvement based on HIVE-6329 using key management via kms.
 # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get 
 key)
 {code}
  property
 namehadoop.kms.acl.GET/name
 valueuser1 root/value
 description
   ACL for get-key-version and get-current-key operations.
 /description
   /property
 {code}
 # set hive-site.xml 
 {code}
  property  
 namehadoop.security.kms.uri/name  
 valuehttp://localhost:16000/kms/value  
  /property 
 {code}
 # create an encrypted table
 {code}
 -- region-aes-column.q
 drop table region_aes_column;
 create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
 SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter')
   STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1);
 insert overwrite table region_aes_column
 select
   r_regionkey, r_name
 from region;
 {code}
 # query table by different user, this is transparent to users. It is very 
 convenient and don't need to set anything.
 {code}
 [root@huang1 hive_data]# hive
 hive select * from region_aes_column;
 OK
 0 AFRICA
 1 AMERICA
 2 ASIA
 3 EUROPE
 4 MIDDLE EAST
 Time taken: 0.9 seconds, Fetched: 5 row(s)
 [root@huang1 hive_data]# su user1
 [user1@huang1 hive_data]$ hive
 hive select * from region_aes_column;
 OK
 0 AFRICA
 1 AMERICA
 2 ASIA
 3 EUROPE
 4 MIDDLE EAST
 Time taken: 0.899 seconds, Fetched: 5 row(s)
 [root@huang1 hive_data]# su user2
 [user2@huang1 hive_data]$ hive
 hive select * from region_aes_column;
 OK
 0 RcQycWVD
 1 Rc8lam9Bxg==
 2 RdEpeQ==
 3 Qdcyd3ZH
 4 ScskfGpHp8KIIuY=
 Time taken: 0.749 seconds, Fetched: 5 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work stopped] (HIVE-8416) Generic key management framework

2014-12-08 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-8416 stopped by Xiaomeng Huang.

 Generic key management framework
 

 Key: HIVE-8416
 URL: https://issues.apache.org/jira/browse/HIVE-8416
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang

 In this patch, it will includes the interfaces of keyProvider, and the 
 default implementation using java KeyStore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8969) getChildPrivileges should in one transaction with revoke

2014-11-25 Thread Xiaomeng Huang (JIRA)
Xiaomeng Huang created HIVE-8969:


 Summary: getChildPrivileges should in one transaction with revoke
 Key: HIVE-8969
 URL: https://issues.apache.org/jira/browse/HIVE-8969
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6667) Need support for show tables authorization

2014-10-20 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176727#comment-14176727
 ] 

Xiaomeng Huang commented on HIVE-6667:
--

I don't see HiveDriverFilterHook in apache hive, did I miss anything?

 Need support for show tables authorization
 

 Key: HIVE-6667
 URL: https://issues.apache.org/jira/browse/HIVE-6667
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Reporter: Alex Nastetsky
 Attachments: HIVE-6667.patch


 Need the ability to restrict access to show tables on a per database basis 
 or globally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-8416) Generic key management framework

2014-10-10 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-8416 started by Xiaomeng Huang.

 Generic key management framework
 

 Key: HIVE-8416
 URL: https://issues.apache.org/jira/browse/HIVE-8416
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang

 In this patch, it will includes the interfaces of keyProvider, and the 
 default implementation using java KeyStore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8252) Generic cryptographic codec

2014-10-09 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8252:
-
Summary: Generic cryptographic codec  (was: Generic cryptographic codec and 
key management framework)

 Generic cryptographic codec
 ---

 Key: HIVE-8252
 URL: https://issues.apache.org/jira/browse/HIVE-8252
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8252) Generic cryptographic codec

2014-10-09 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8252:
-
Description: In this patch, it will includes interfaces or abstract classes 
of generic Key, CryptoCodec, Encryptor and Decryptor. And the JCE AES 
implementation of the interfaces and abstract classes.

 Generic cryptographic codec
 ---

 Key: HIVE-8252
 URL: https://issues.apache.org/jira/browse/HIVE-8252
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang

 In this patch, it will includes interfaces or abstract classes of generic 
 Key, CryptoCodec, Encryptor and Decryptor. And the JCE AES implementation of 
 the interfaces and abstract classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8416) Generic key management framework

2014-10-09 Thread Xiaomeng Huang (JIRA)
Xiaomeng Huang created HIVE-8416:


 Summary: Generic key management framework
 Key: HIVE-8416
 URL: https://issues.apache.org/jira/browse/HIVE-8416
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8252) Generic cryptographic codec

2014-10-09 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8252:
-
Attachment: HIVE-8252.001.patch

 Generic cryptographic codec
 ---

 Key: HIVE-8252
 URL: https://issues.apache.org/jira/browse/HIVE-8252
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8252.001.patch


 In this patch, it will includes interfaces or abstract classes of generic 
 Key, CryptoCodec, Encryptor and Decryptor. And the JCE AES implementation of 
 the interfaces and abstract classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8252) Generic cryptographic codec

2014-10-09 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8252:
-
Status: Patch Available  (was: Open)

 Generic cryptographic codec
 ---

 Key: HIVE-8252
 URL: https://issues.apache.org/jira/browse/HIVE-8252
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8252.001.patch


 In this patch, it will includes interfaces or abstract classes of generic 
 Key, CryptoCodec, Encryptor and Decryptor. And the JCE AES implementation of 
 the interfaces and abstract classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8416) Generic key management framework

2014-10-09 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8416:
-
Description: In this patch, it will includes the interfaces of keyProvider, 
and the default implementation using java KeyStore

 Generic key management framework
 

 Key: HIVE-8416
 URL: https://issues.apache.org/jira/browse/HIVE-8416
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang

 In this patch, it will includes the interfaces of keyProvider, and the 
 default implementation using java KeyStore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8252) Generic cryptographic codec

2014-10-09 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8252:
-
Attachment: HIVE-8252.002.patch

 Generic cryptographic codec
 ---

 Key: HIVE-8252
 URL: https://issues.apache.org/jira/browse/HIVE-8252
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8252.001.patch, HIVE-8252.002.patch


 In this patch, it will includes interfaces or abstract classes of generic 
 Key, CryptoCodec, Encryptor and Decryptor. And the JCE AES implementation of 
 the interfaces and abstract classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8252) Generic cryptographic codec and key management framework

2014-09-25 Thread Xiaomeng Huang (JIRA)
Xiaomeng Huang created HIVE-8252:


 Summary: Generic cryptographic codec and key management framework
 Key: HIVE-8252
 URL: https://issues.apache.org/jira/browse/HIVE-8252
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8049) Transparent column level encryption using kms

2014-09-25 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8049:
-
Summary: Transparent column level encryption using kms  (was: Transparent 
column level encryption using key management)

 Transparent column level encryption using kms
 -

 Key: HIVE-8049
 URL: https://issues.apache.org/jira/browse/HIVE-8049
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8049.001.patch


 This patch implement transparent column level encryption. Users don't need to 
 set anything when they quey tables.
 # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get 
 key)
 {code}
  property
 namehadoop.kms.acl.GET/name
 valueuser1 root/value
 description
   ACL for get-key-version and get-current-key operations.
 /description
   /property
 {code}
 # set hive-site.xml 
 {code}
  property  
 namehadoop.security.kms.uri/name  
 valuehttp://localhost:16000/kms/value  
  /property 
 {code}
 # create an encrypted table
 {code}
 -- region-aes-column.q
 drop table region_aes_column;
 create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
 SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter')
   STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1);
 insert overwrite table region_aes_column
 select
   r_regionkey, r_name
 from region;
 {code}
 # query table by different user, this is transparent to users. It is very 
 convenient and don't need to set anything.
 {code}
 [root@huang1 hive_data]# hive
 hive select * from region_aes_column;
 OK
 0 AFRICA
 1 AMERICA
 2 ASIA
 3 EUROPE
 4 MIDDLE EAST
 Time taken: 0.9 seconds, Fetched: 5 row(s)
 [root@huang1 hive_data]# su user1
 [user1@huang1 hive_data]$ hive
 hive select * from region_aes_column;
 OK
 0 AFRICA
 1 AMERICA
 2 ASIA
 3 EUROPE
 4 MIDDLE EAST
 Time taken: 0.899 seconds, Fetched: 5 row(s)
 [root@huang1 hive_data]# su user2
 [user2@huang1 hive_data]$ hive
 hive select * from region_aes_column;
 OK
 0 RcQycWVD
 1 Rc8lam9Bxg==
 2 RdEpeQ==
 3 Qdcyd3ZH
 4 ScskfGpHp8KIIuY=
 Time taken: 0.749 seconds, Fetched: 5 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity

2014-09-15 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7932:
-
Attachment: HIVE-7932.002.patch

Add a testcase to test accessed columns from ReadEntity.

 It may cause NP exception when add accessed columns to ReadEntity
 -

 Key: HIVE-7932
 URL: https://issues.apache.org/jira/browse/HIVE-7932
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-7932.001.patch, HIVE-7932.002.patch


 {code}
 case TABLE:
entity.getAccessedColumns().addAll(
   tableToColumnAccessMap.get(entity.getTable().getCompleteName()));
 {code}
 if  tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, 
 addAll(null) will throw null pointer exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity

2014-09-15 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7932:
-
Attachment: HIVE-7932.002.patch

 It may cause NP exception when add accessed columns to ReadEntity
 -

 Key: HIVE-7932
 URL: https://issues.apache.org/jira/browse/HIVE-7932
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-7932.001.patch, HIVE-7932.002.patch


 {code}
 case TABLE:
entity.getAccessedColumns().addAll(
   tableToColumnAccessMap.get(entity.getTable().getCompleteName()));
 {code}
 if  tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, 
 addAll(null) will throw null pointer exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity

2014-09-15 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7932:
-
Attachment: (was: HIVE-7932.002.patch)

 It may cause NP exception when add accessed columns to ReadEntity
 -

 Key: HIVE-7932
 URL: https://issues.apache.org/jira/browse/HIVE-7932
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-7932.001.patch, HIVE-7932.002.patch


 {code}
 case TABLE:
entity.getAccessedColumns().addAll(
   tableToColumnAccessMap.get(entity.getTable().getCompleteName()));
 {code}
 if  tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, 
 addAll(null) will throw null pointer exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity

2014-09-15 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134817#comment-14134817
 ] 

Xiaomeng Huang commented on HIVE-7932:
--

This test failure is not caused by my patch, and I can pass it in my local 
machine.

 It may cause NP exception when add accessed columns to ReadEntity
 -

 Key: HIVE-7932
 URL: https://issues.apache.org/jira/browse/HIVE-7932
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-7932.001.patch, HIVE-7932.002.patch


 {code}
 case TABLE:
entity.getAccessedColumns().addAll(
   tableToColumnAccessMap.get(entity.getTable().getCompleteName()));
 {code}
 if  tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, 
 addAll(null) will throw null pointer exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7934) Improve column level encryption with key management

2014-09-11 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7934:
-
Description: 
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
- Base64WriteOnly just be able to get the ciphertext from client for any users. 
- Base64Rewriter just be able to get plaintext from client for any users.

I have an improvement based on HIVE-6329 using key management via kms.
# setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key)
{code}
 property
namehadoop.kms.acl.GET/name
valueuser1 root/value
description
  ACL for get-key-version and get-current-key operations.
/description
  /property
{code}
# create an encrypted table
{code}
-- region-aes-column.q
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter')
  STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1);
insert overwrite table region_aes_column
select
  r_regionkey, r_name
from region;
{code}
# query table by different user, this is transparent to users. It is very 
convenient and don't need to set anything.
{code}
[root@huang1 hive_data]# hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.9 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user1
[user1@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.899 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user2
[user2@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   RcQycWVD
1   Rc8lam9Bxg==
2   RdEpeQ==
3   Qdcyd3ZH
4   ScskfGpHp8KIIuY=
Time taken: 0.749 seconds, Fetched: 5 row(s)
{code}

  was:
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
Base64WriteOnly can just get the ciphertext from client for any users. And 
Base64Rewriter can just get plaintext from client for any users.
I have an improvement based on HIVE-6329 using key management via kms.
# setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key)
{code}
 property
namehadoop.kms.acl.GET/name
valueuser1 root/value
description
  ACL for get-key-version and get-current-key operations.
/description
  /property
{code}
# create an encrypted table
{code}
-- region-aes-column.q
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter')
  STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1);
insert overwrite table region_aes_column
select
  r_regionkey, r_name
from region;
{code}
# query table by different user, this is transparent to users. It is very 
convenient and don't need to set anything.
{code}
[root@huang1 hive_data]# hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.9 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user1
[user1@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.899 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user2
[user2@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   RcQycWVD
1   Rc8lam9Bxg==
2   RdEpeQ==
3   Qdcyd3ZH
4   ScskfGpHp8KIIuY=
Time taken: 0.749 seconds, Fetched: 5 row(s)
{code}


 Improve column level encryption with key management
 ---

 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor

 Now HIVE-6329 is a framework of column level encryption/decryption. But the 
 implementation in HIVE-6329 is just use Base64, it is not safe and have some 
 problems:
 - Base64WriteOnly just be able to get the ciphertext from client for any 
 users. 
 - Base64Rewriter just be able to get plaintext from client for any users.
 I have an improvement based on HIVE-6329 using key management via kms.
 # setup kms and set kms-acls.xml (e.g. user1 and root 

[jira] [Updated] (HIVE-7934) Improve column level encryption with key management

2014-09-11 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7934:
-
Description: 
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
- Base64WriteOnly just be able to get the ciphertext from client for any users. 
- Base64Rewriter just be able to get plaintext from client for any users.

I have an improvement based on HIVE-6329 using key management via kms.
# setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key)
{code}
 property
namehadoop.kms.acl.GET/name
valueuser1 root/value
description
  ACL for get-key-version and get-current-key operations.
/description
  /property
{code}
# set hive-site.xml 
{code}
 property  
namehadoop.security.kms.uri/name  
valuehttp://localhost:16000/kms/value  
 /property 
{code}
# create an encrypted table
{code}
-- region-aes-column.q
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter')
  STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1);
insert overwrite table region_aes_column
select
  r_regionkey, r_name
from region;
{code}
# query table by different user, this is transparent to users. It is very 
convenient and don't need to set anything.
{code}
[root@huang1 hive_data]# hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.9 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user1
[user1@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.899 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user2
[user2@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   RcQycWVD
1   Rc8lam9Bxg==
2   RdEpeQ==
3   Qdcyd3ZH
4   ScskfGpHp8KIIuY=
Time taken: 0.749 seconds, Fetched: 5 row(s)
{code}

  was:
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
- Base64WriteOnly just be able to get the ciphertext from client for any users. 
- Base64Rewriter just be able to get plaintext from client for any users.

I have an improvement based on HIVE-6329 using key management via kms.
# setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key)
{code}
 property
namehadoop.kms.acl.GET/name
valueuser1 root/value
description
  ACL for get-key-version and get-current-key operations.
/description
  /property
{code}
# create an encrypted table
{code}
-- region-aes-column.q
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter')
  STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1);
insert overwrite table region_aes_column
select
  r_regionkey, r_name
from region;
{code}
# query table by different user, this is transparent to users. It is very 
convenient and don't need to set anything.
{code}
[root@huang1 hive_data]# hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.9 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user1
[user1@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.899 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user2
[user2@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   RcQycWVD
1   Rc8lam9Bxg==
2   RdEpeQ==
3   Qdcyd3ZH
4   ScskfGpHp8KIIuY=
Time taken: 0.749 seconds, Fetched: 5 row(s)
{code}


 Improve column level encryption with key management
 ---

 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor

 Now HIVE-6329 is a framework of column level encryption/decryption. But the 
 implementation in HIVE-6329 is just use Base64, it is not safe and have some 
 problems:
 - Base64WriteOnly just be able to get the ciphertext from client for any 
 users. 
 - Base64Rewriter just be able to get 

[jira] [Commented] (HIVE-8049) Transparent column level encryption using key management

2014-09-11 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129787#comment-14129787
 ] 

Xiaomeng Huang commented on HIVE-8049:
--

Initial patch based on kms simple mode

 Transparent column level encryption using key management
 

 Key: HIVE-8049
 URL: https://issues.apache.org/jira/browse/HIVE-8049
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8049.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8049) Transparent column level encryption using key management

2014-09-11 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8049:
-
Description: 
This patch implement transparent column level encryption. Users don't need to 
set anything when they quey tables.
# setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key)
{code}
 property
namehadoop.kms.acl.GET/name
valueuser1 root/value
description
  ACL for get-key-version and get-current-key operations.
/description
  /property
{code}
# set hive-site.xml 
{code}
 property  
namehadoop.security.kms.uri/name  
valuehttp://localhost:16000/kms/value  
 /property 
{code}
# create an encrypted table
{code}
-- region-aes-column.q
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter')
  STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1);
insert overwrite table region_aes_column
select
  r_regionkey, r_name
from region;
{code}
# query table by different user, this is transparent to users. It is very 
convenient and don't need to set anything.
{code}
[root@huang1 hive_data]# hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.9 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user1
[user1@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.899 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user2
[user2@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   RcQycWVD
1   Rc8lam9Bxg==
2   RdEpeQ==
3   Qdcyd3ZH
4   ScskfGpHp8KIIuY=
Time taken: 0.749 seconds, Fetched: 5 row(s)
{code}

 Transparent column level encryption using key management
 

 Key: HIVE-8049
 URL: https://issues.apache.org/jira/browse/HIVE-8049
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8049.001.patch


 This patch implement transparent column level encryption. Users don't need to 
 set anything when they quey tables.
 # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get 
 key)
 {code}
  property
 namehadoop.kms.acl.GET/name
 valueuser1 root/value
 description
   ACL for get-key-version and get-current-key operations.
 /description
   /property
 {code}
 # set hive-site.xml 
 {code}
  property  
 namehadoop.security.kms.uri/name  
 valuehttp://localhost:16000/kms/value  
  /property 
 {code}
 # create an encrypted table
 {code}
 -- region-aes-column.q
 drop table region_aes_column;
 create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
 SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter')
   STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1);
 insert overwrite table region_aes_column
 select
   r_regionkey, r_name
 from region;
 {code}
 # query table by different user, this is transparent to users. It is very 
 convenient and don't need to set anything.
 {code}
 [root@huang1 hive_data]# hive
 hive select * from region_aes_column;
 OK
 0 AFRICA
 1 AMERICA
 2 ASIA
 3 EUROPE
 4 MIDDLE EAST
 Time taken: 0.9 seconds, Fetched: 5 row(s)
 [root@huang1 hive_data]# su user1
 [user1@huang1 hive_data]$ hive
 hive select * from region_aes_column;
 OK
 0 AFRICA
 1 AMERICA
 2 ASIA
 3 EUROPE
 4 MIDDLE EAST
 Time taken: 0.899 seconds, Fetched: 5 row(s)
 [root@huang1 hive_data]# su user2
 [user2@huang1 hive_data]$ hive
 hive select * from region_aes_column;
 OK
 0 RcQycWVD
 1 Rc8lam9Bxg==
 2 RdEpeQ==
 3 Qdcyd3ZH
 4 ScskfGpHp8KIIuY=
 Time taken: 0.749 seconds, Fetched: 5 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8049) Transparent column level encryption using key management

2014-09-10 Thread Xiaomeng Huang (JIRA)
Xiaomeng Huang created HIVE-8049:


 Summary: Transparent column level encryption using key management
 Key: HIVE-8049
 URL: https://issues.apache.org/jira/browse/HIVE-8049
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8050) Using master key to protect data key

2014-09-10 Thread Xiaomeng Huang (JIRA)
Xiaomeng Huang created HIVE-8050:


 Summary: Using master key to protect data key
 Key: HIVE-8050
 URL: https://issues.apache.org/jira/browse/HIVE-8050
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8049) Transparent column level encryption using key management

2014-09-10 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-8049:
-
Attachment: HIVE-8049.001.patch

 Transparent column level encryption using key management
 

 Key: HIVE-8049
 URL: https://issues.apache.org/jira/browse/HIVE-8049
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8049.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-8049) Transparent column level encryption using key management

2014-09-10 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-8049 started by Xiaomeng Huang.

 Transparent column level encryption using key management
 

 Key: HIVE-8049
 URL: https://issues.apache.org/jira/browse/HIVE-8049
 Project: Hive
  Issue Type: Sub-task
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-8049.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7934) Improve column level encryption with key management

2014-09-10 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7934:
-
Description: 
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
Base64WriteOnly can just get the ciphertext from client for any users. And 
Base64Rewriter can just get plaintext from client for any users.
I have an improvement based on HIVE-6329 using key management via kms.
# setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key)
{code}
 property
namehadoop.kms.acl.GET/name
valueuser1 root/value
description
  ACL for get-key-version and get-current-key operations.
/description
  /property
{code}
# create an encrypted table
{code}
-- region-aes-column.q
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter')
  STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1);
insert overwrite table region_aes_column
select
  r_regionkey, r_name
from region;
{code}
# query table by different user, this is transparent to users. It is very 
convenient and don't need to set anything.
{code}
[root@huang1 hive_data]# hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.9 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user1
[user1@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.899 seconds, Fetched: 5 row(s)

[root@huang1 hive_data]# su user2
[user2@huang1 hive_data]$ hive
hive select * from region_aes_column;
OK
0   RcQycWVD
1   Rc8lam9Bxg==
2   RdEpeQ==
3   Qdcyd3ZH
4   ScskfGpHp8KIIuY=
Time taken: 0.749 seconds, Fetched: 5 row(s)
{code}

  was:
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
Base64WriteOnly can just get the ciphertext from client for any users. And 
Base64Rewriter can just get plaintext from client for any users.
I have an improvement based on HIVE-7934 using key management.
{code}
-- region-aes-column.q
set hive.encrypt.key=123456789;
set hive.encrypt.iv=123456; 
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') 
  STORED AS TEXTFILE;
insert overwrite table region_aes_column 
select 
  r_regionkey, r_name
from region;

hive select * from region_aes_column;
OK
0   /q5RTO1X
1   /qVGV+dV3g==
2   /rtKRA==
3   +r1RSv5T
4   8qFHQeJTvxWUadw=
Time taken: 0.666 seconds, Fetched: 5 row(s)

hive set hive.encrypt.key=123456789;
hive set hive.encrypt.iv=123456;
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.714 seconds, Fetched: 5 row(s)
{code}


 Improve column level encryption with key management
 ---

 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor

 Now HIVE-6329 is a framework of column level encryption/decryption. But the 
 implementation in HIVE-6329 is just use Base64, it is not safe and have some 
 problems:
 Base64WriteOnly can just get the ciphertext from client for any users. And 
 Base64Rewriter can just get plaintext from client for any users.
 I have an improvement based on HIVE-6329 using key management via kms.
 # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get 
 key)
 {code}
  property
 namehadoop.kms.acl.GET/name
 valueuser1 root/value
 description
   ACL for get-key-version and get-current-key operations.
 /description
   /property
 {code}
 # create an encrypted table
 {code}
 -- region-aes-column.q
 drop table region_aes_column;
 create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
 SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter')
   STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1);
 insert overwrite table region_aes_column
 

[jira] [Work started] (HIVE-7934) Improve column level encryption with key management

2014-09-04 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-7934 started by Xiaomeng Huang.

 Improve column level encryption with key management
 ---

 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor

 Now HIVE-6329 is a framework of column level encryption/decryption. But the 
 implementation in HIVE-6329 is just use Base64, it is not safe and have some 
 problems:
 Base64WriteOnly can just get the ciphertext from client for any users. And 
 Base64Rewriter can just get plaintext from client for any users.
 I have an improvement based on HIVE-7934 using key management.
 {code}
 -- region-aes-column.q
 set hive.encrypt.key=123456789;
 set hive.encrypt.iv=123456; 
 drop table region_aes_column;
 create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
 SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') 
   STORED AS TEXTFILE;
 insert overwrite table region_aes_column 
 select 
   r_regionkey, r_name
 from region;
 hive select * from region_aes_column;
 OK
 0 /q5RTO1X
 1 /qVGV+dV3g==
 2 /rtKRA==
 3 +r1RSv5T
 4 8qFHQeJTvxWUadw=
 Time taken: 0.666 seconds, Fetched: 5 row(s)
 hive set hive.encrypt.key=123456789;
 hive set hive.encrypt.iv=123456;
 hive select * from region_aes_column;
 OK
 0 AFRICA
 1 AMERICA
 2 ASIA
 3 EUROPE
 4 MIDDLE EAST
 Time taken: 0.714 seconds, Fetched: 5 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7934) Improve column level encryption with key management

2014-09-02 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7934:
-
Description: 
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
Base64WriteOnly can just get the ciphertext from client for any users. And 
Base64Rewriter can just get plaintext from client for any users.
I have an improvement based on HIVE-7934 using key management.
{code}
-- region-aes-column.q
set hive.encrypt.key=123456789;
set hive.encrypt.iv=123456; 
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') 
  STORED AS TEXTFILE;
insert overwrite table region_aes_column 
select 
  r_regionkey, r_name
from region;

hive select * from region_aes_column;
OK
0   /q5RTO1X
1   /qVGV+dV3g==
2   /rtKRA==
3   +r1RSv5T
4   8qFHQeJTvxWUadw=
Time taken: 0.666 seconds, Fetched: 5 row(s)

hive set hive.encrypt.key=123456789;
hive set hive.encrypt.iv=123456;
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.714 seconds, Fetched: 5 row(s)
{code}

  was:
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
Base64WriteOnly can just get the ciphertext from client for any users. And 
Base64Rewriter can just get plaintext from client for any users.
I have an improvement based HIVE-7934 using key management.
{code}
-- region-aes-column.q
set hive.encrypt.key=123456789;
set hive.encrypt.iv=123456; 
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') 
  STORED AS TEXTFILE;
insert overwrite table region_aes_column 
select 
  r_regionkey, r_name
from region;

hive select * from region_aes_column;
OK
0   /q5RTO1X
1   /qVGV+dV3g==
2   /rtKRA==
3   +r1RSv5T
4   8qFHQeJTvxWUadw=
Time taken: 0.666 seconds, Fetched: 5 row(s)

hive set hive.encrypt.key=123456789;
hive set hive.encrypt.iv=123456;
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.714 seconds, Fetched: 5 row(s)
{code}


 Improve column level encryption with key management
 ---

 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor

 Now HIVE-6329 is a framework of column level encryption/decryption. But the 
 implementation in HIVE-6329 is just use Base64, it is not safe and have some 
 problems:
 Base64WriteOnly can just get the ciphertext from client for any users. And 
 Base64Rewriter can just get plaintext from client for any users.
 I have an improvement based on HIVE-7934 using key management.
 {code}
 -- region-aes-column.q
 set hive.encrypt.key=123456789;
 set hive.encrypt.iv=123456; 
 drop table region_aes_column;
 create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
 SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') 
   STORED AS TEXTFILE;
 insert overwrite table region_aes_column 
 select 
   r_regionkey, r_name
 from region;
 hive select * from region_aes_column;
 OK
 0 /q5RTO1X
 1 /qVGV+dV3g==
 2 /rtKRA==
 3 +r1RSv5T
 4 8qFHQeJTvxWUadw=
 Time taken: 0.666 seconds, Fetched: 5 row(s)
 hive set hive.encrypt.key=123456789;
 hive set hive.encrypt.iv=123456;
 hive select * from region_aes_column;
 OK
 0 AFRICA
 1 AMERICA
 2 ASIA
 3 EUROPE
 4 MIDDLE EAST
 Time taken: 0.714 seconds, Fetched: 5 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-09-01 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang resolved HIVE-7730.
--
Resolution: Fixed

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Fix For: 0.14.0

 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, 
 HIVE-7730.003.patch, HIVE-7730.004.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-09-01 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-
Attachment: (was: HIVE-7730-fix-NP-issue.patch)

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Fix For: 0.14.0

 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, 
 HIVE-7730.003.patch, HIVE-7730.004.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity

2014-09-01 Thread Xiaomeng Huang (JIRA)
Xiaomeng Huang created HIVE-7932:


 Summary: It may cause NP exception when add accessed columns to 
ReadEntity
 Key: HIVE-7932
 URL: https://issues.apache.org/jira/browse/HIVE-7932
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity

2014-09-01 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7932:
-
Description: 
{code}
case TABLE:
   entity.getAccessedColumns().addAll(
  tableToColumnAccessMap.get(entity.getTable().getCompleteName()));
{code}
if  tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, 
addAll(null) will throw null pointer exception.

 It may cause NP exception when add accessed columns to ReadEntity
 -

 Key: HIVE-7932
 URL: https://issues.apache.org/jira/browse/HIVE-7932
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang

 {code}
 case TABLE:
entity.getAccessedColumns().addAll(
   tableToColumnAccessMap.get(entity.getTable().getCompleteName()));
 {code}
 if  tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, 
 addAll(null) will throw null pointer exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity

2014-09-01 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7932:
-
Attachment: HIVE-7932.001.patch

 It may cause NP exception when add accessed columns to ReadEntity
 -

 Key: HIVE-7932
 URL: https://issues.apache.org/jira/browse/HIVE-7932
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Attachments: HIVE-7932.001.patch


 {code}
 case TABLE:
entity.getAccessedColumns().addAll(
   tableToColumnAccessMap.get(entity.getTable().getCompleteName()));
 {code}
 if  tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, 
 addAll(null) will throw null pointer exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-7934) Improve column level encryption with key management

2014-09-01 Thread Xiaomeng Huang (JIRA)
Xiaomeng Huang created HIVE-7934:


 Summary: Improve column level encryption with key management
 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7934) Improve column level encryption with key management

2014-09-01 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7934:
-
Description: Now 

 Improve column level encryption with key management
 ---

 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor

 Now 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7934) Improve column level encryption with key management

2014-09-01 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7934:
-
Description: 
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems.
Base64WriteOnly can just get the ciphertext from client for any users. And 
Base64Rewriter can just get plaintext from client for any users.
I have an improvement based HIVE-7934 using key management.
{code}
-- region-aes-column.q
set hive.encrypt.key=123456789;
set hive.encrypt.iv=123456; 
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter', 
'column.encode.key'='123456789', 'column.encode.iv'='123456') 
  STORED AS TEXTFILE;
insert overwrite table region_aes_column 
select 
  r_regionkey, r_name
from region;

hive select * from region_aes_column;
OK
0   /q5RTO1X
1   /qVGV+dV3g==
2   /rtKRA==
3   +r1RSv5T
4   8qFHQeJTvxWUadw=
Time taken: 0.666 seconds, Fetched: 5 row(s)

hive set hive.encrypt.key=123456789;
hive set hive.encrypt.iv=123456;
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.714 seconds, Fetched: 5 row(s)
{code}

  was:Now 


 Improve column level encryption with key management
 ---

 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor

 Now HIVE-6329 is a framework of column level encryption/decryption. But the 
 implementation in HIVE-6329 is just use Base64, it is not safe and have some 
 problems.
 Base64WriteOnly can just get the ciphertext from client for any users. And 
 Base64Rewriter can just get plaintext from client for any users.
 I have an improvement based HIVE-7934 using key management.
 {code}
 -- region-aes-column.q
 set hive.encrypt.key=123456789;
 set hive.encrypt.iv=123456; 
 drop table region_aes_column;
 create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
 SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter', 
 'column.encode.key'='123456789', 'column.encode.iv'='123456') 
   STORED AS TEXTFILE;
 insert overwrite table region_aes_column 
 select 
   r_regionkey, r_name
 from region;
 hive select * from region_aes_column;
 OK
 0 /q5RTO1X
 1 /qVGV+dV3g==
 2 /rtKRA==
 3 +r1RSv5T
 4 8qFHQeJTvxWUadw=
 Time taken: 0.666 seconds, Fetched: 5 row(s)
 hive set hive.encrypt.key=123456789;
 hive set hive.encrypt.iv=123456;
 hive select * from region_aes_column;
 OK
 0 AFRICA
 1 AMERICA
 2 ASIA
 3 EUROPE
 4 MIDDLE EAST
 Time taken: 0.714 seconds, Fetched: 5 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7934) Improve column level encryption with key management

2014-09-01 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7934:
-
Description: 
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
Base64WriteOnly can just get the ciphertext from client for any users. And 
Base64Rewriter can just get plaintext from client for any users.
I have an improvement based HIVE-7934 using key management.
{code}
-- region-aes-column.q
set hive.encrypt.key=123456789;
set hive.encrypt.iv=123456; 
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter', 
'column.encode.key'='123456789', 'column.encode.iv'='123456') 
  STORED AS TEXTFILE;
insert overwrite table region_aes_column 
select 
  r_regionkey, r_name
from region;

hive select * from region_aes_column;
OK
0   /q5RTO1X
1   /qVGV+dV3g==
2   /rtKRA==
3   +r1RSv5T
4   8qFHQeJTvxWUadw=
Time taken: 0.666 seconds, Fetched: 5 row(s)

hive set hive.encrypt.key=123456789;
hive set hive.encrypt.iv=123456;
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.714 seconds, Fetched: 5 row(s)
{code}

  was:
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems.
Base64WriteOnly can just get the ciphertext from client for any users. And 
Base64Rewriter can just get plaintext from client for any users.
I have an improvement based HIVE-7934 using key management.
{code}
-- region-aes-column.q
set hive.encrypt.key=123456789;
set hive.encrypt.iv=123456; 
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter', 
'column.encode.key'='123456789', 'column.encode.iv'='123456') 
  STORED AS TEXTFILE;
insert overwrite table region_aes_column 
select 
  r_regionkey, r_name
from region;

hive select * from region_aes_column;
OK
0   /q5RTO1X
1   /qVGV+dV3g==
2   /rtKRA==
3   +r1RSv5T
4   8qFHQeJTvxWUadw=
Time taken: 0.666 seconds, Fetched: 5 row(s)

hive set hive.encrypt.key=123456789;
hive set hive.encrypt.iv=123456;
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.714 seconds, Fetched: 5 row(s)
{code}


 Improve column level encryption with key management
 ---

 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor

 Now HIVE-6329 is a framework of column level encryption/decryption. But the 
 implementation in HIVE-6329 is just use Base64, it is not safe and have some 
 problems:
 Base64WriteOnly can just get the ciphertext from client for any users. And 
 Base64Rewriter can just get plaintext from client for any users.
 I have an improvement based HIVE-7934 using key management.
 {code}
 -- region-aes-column.q
 set hive.encrypt.key=123456789;
 set hive.encrypt.iv=123456; 
 drop table region_aes_column;
 create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
 SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter', 
 'column.encode.key'='123456789', 'column.encode.iv'='123456') 
   STORED AS TEXTFILE;
 insert overwrite table region_aes_column 
 select 
   r_regionkey, r_name
 from region;
 hive select * from region_aes_column;
 OK
 0 /q5RTO1X
 1 /qVGV+dV3g==
 2 /rtKRA==
 3 +r1RSv5T
 4 8qFHQeJTvxWUadw=
 Time taken: 0.666 seconds, Fetched: 5 row(s)
 hive set hive.encrypt.key=123456789;
 hive set hive.encrypt.iv=123456;
 hive select * from region_aes_column;
 OK
 0 AFRICA
 1 AMERICA
 2 ASIA
 3 EUROPE
 4 MIDDLE EAST
 Time taken: 0.714 seconds, Fetched: 5 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7934) Improve column level encryption with key management

2014-09-01 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7934:
-
Description: 
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
Base64WriteOnly can just get the ciphertext from client for any users. And 
Base64Rewriter can just get plaintext from client for any users.
I have an improvement based HIVE-7934 using key management.
{code}
-- region-aes-column.q
set hive.encrypt.key=123456789;
set hive.encrypt.iv=123456; 
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') 
  STORED AS TEXTFILE;
insert overwrite table region_aes_column 
select 
  r_regionkey, r_name
from region;

hive select * from region_aes_column;
OK
0   /q5RTO1X
1   /qVGV+dV3g==
2   /rtKRA==
3   +r1RSv5T
4   8qFHQeJTvxWUadw=
Time taken: 0.666 seconds, Fetched: 5 row(s)

hive set hive.encrypt.key=123456789;
hive set hive.encrypt.iv=123456;
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.714 seconds, Fetched: 5 row(s)
{code}

  was:
Now HIVE-6329 is a framework of column level encryption/decryption. But the 
implementation in HIVE-6329 is just use Base64, it is not safe and have some 
problems:
Base64WriteOnly can just get the ciphertext from client for any users. And 
Base64Rewriter can just get plaintext from client for any users.
I have an improvement based HIVE-7934 using key management.
{code}
-- region-aes-column.q
set hive.encrypt.key=123456789;
set hive.encrypt.iv=123456; 
drop table region_aes_column;
create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
  WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter', 
'column.encode.key'='123456789', 'column.encode.iv'='123456') 
  STORED AS TEXTFILE;
insert overwrite table region_aes_column 
select 
  r_regionkey, r_name
from region;

hive select * from region_aes_column;
OK
0   /q5RTO1X
1   /qVGV+dV3g==
2   /rtKRA==
3   +r1RSv5T
4   8qFHQeJTvxWUadw=
Time taken: 0.666 seconds, Fetched: 5 row(s)

hive set hive.encrypt.key=123456789;
hive set hive.encrypt.iv=123456;
hive select * from region_aes_column;
OK
0   AFRICA
1   AMERICA
2   ASIA
3   EUROPE
4   MIDDLE EAST
Time taken: 0.714 seconds, Fetched: 5 row(s)
{code}


 Improve column level encryption with key management
 ---

 Key: HIVE-7934
 URL: https://issues.apache.org/jira/browse/HIVE-7934
 Project: Hive
  Issue Type: Improvement
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
Priority: Minor

 Now HIVE-6329 is a framework of column level encryption/decryption. But the 
 implementation in HIVE-6329 is just use Base64, it is not safe and have some 
 problems:
 Base64WriteOnly can just get the ciphertext from client for any users. And 
 Base64Rewriter can just get plaintext from client for any users.
 I have an improvement based HIVE-7934 using key management.
 {code}
 -- region-aes-column.q
 set hive.encrypt.key=123456789;
 set hive.encrypt.iv=123456; 
 drop table region_aes_column;
 create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT 
 SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') 
   STORED AS TEXTFILE;
 insert overwrite table region_aes_column 
 select 
   r_regionkey, r_name
 from region;
 hive select * from region_aes_column;
 OK
 0 /q5RTO1X
 1 /qVGV+dV3g==
 2 /rtKRA==
 3 +r1RSv5T
 4 8qFHQeJTvxWUadw=
 Time taken: 0.666 seconds, Fetched: 5 row(s)
 hive set hive.encrypt.key=123456789;
 hive set hive.encrypt.iv=123456;
 hive select * from region_aes_column;
 OK
 0 AFRICA
 1 AMERICA
 2 ASIA
 3 EUROPE
 4 MIDDLE EAST
 Time taken: 0.714 seconds, Fetched: 5 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-31 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang reopened HIVE-7730:
--

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Fix For: 0.14.0

 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, 
 HIVE-7730.003.patch, HIVE-7730.004.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-31 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-
Attachment: HIVE-7730-fix-NP-issue.patch

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Fix For: 0.14.0

 Attachments: HIVE-7730-fix-NP-issue.patch, HIVE-7730.001.patch, 
 HIVE-7730.002.patch, HIVE-7730.003.patch, HIVE-7730.004.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-31 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117038#comment-14117038
 ] 

Xiaomeng Huang commented on HIVE-7730:
--

Hi [~szehon]
There is a null pointer issue in latest patch.
entity.getAccessedColumns().addAll(tableToColumnAccessMap.get(entity.getTable().getCompleteName()));
if tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, 
addAll(null) will throw null pointer exception.
I attached a patch to fix it, could you help to review it? Thanks!

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
Assignee: Xiaomeng Huang
 Fix For: 0.14.0

 Attachments: HIVE-7730-fix-NP-issue.patch, HIVE-7730.001.patch, 
 HIVE-7730.002.patch, HIVE-7730.003.patch, HIVE-7730.004.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6329) Support column level encryption/decryption

2014-08-27 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111999#comment-14111999
 ] 

Xiaomeng Huang commented on HIVE-6329:
--

Hi, Navis
I agree with your patch is a framework of column level encryption/decryption. I 
am curious if you use Base64WriteOnly to encode your values, then how to get 
the plaintext? And now Base64Rewriter just get the plaintext instread of 
ciphertext from client, right?
 I have an idea to improve it: we use keymanagement to do encode/decode in 
Rewriter. And the path of key in local will set to configuration instead of 
SERDEPROPERTIES. User1 use the key1 to encode values when instert data and the 
values of these colums will be encoded in HDFS. User2 want to scan the tables, 
if he has key1, he can decode the value successfully and get the plaintext. 
Otherwise, if he has no key or a wrong key, he will decode failed and just get 
the ciphertext.
If this approach make sense to you. I want to create a jira to improve it via 
keymanagement based on this jira.


 Support column level encryption/decryption
 --

 Key: HIVE-6329
 URL: https://issues.apache.org/jira/browse/HIVE-6329
 Project: Hive
  Issue Type: New Feature
  Components: Security, Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6329.1.patch.txt, HIVE-6329.10.patch.txt, 
 HIVE-6329.11.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, 
 HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, 
 HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt


 Receiving some requirements on encryption recently but hive is not supporting 
 it. Before the full implementation via HIVE-5207, this might be useful for 
 some cases.
 {noformat}
 hive create table encode_test(id int, name STRING, phone STRING, address 
 STRING) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
 STORED AS TEXTFILE;
 OK
 Time taken: 0.584 seconds
 hive insert into table encode_test select 
 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows);
 ..
 OK
 Time taken: 5.121 seconds
 hive select * from encode_test;
 OK
 100   navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
 Time taken: 0.078 seconds, Fetched: 1 row(s)
 hive 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-27 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113376#comment-14113376
 ] 

Xiaomeng Huang commented on HIVE-7730:
--

Thanks [~szehon]!

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Fix For: 0.14.0

 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, 
 HIVE-7730.003.patch, HIVE-7730.004.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-26 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Status: Patch Available  (was: Open)

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, 
 HIVE-7730.003.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-26 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Attachment: HIVE-7730.004.patch

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, 
 HIVE-7730.003.patch, HIVE-7730.004.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-25 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Attachment: HIVE-7730.003.patch

Fixed something from [~szehon]

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, 
 HIVE-7730.003.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6329) Support column level encryption/decryption

2014-08-25 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110217#comment-14110217
 ] 

Xiaomeng Huang commented on HIVE-6329:
--

Hi, Navis
I am very interested in this feature! But there are some build failed issues 
from your lastest patch, could you help to rebase it?

 Support column level encryption/decryption
 --

 Key: HIVE-6329
 URL: https://issues.apache.org/jira/browse/HIVE-6329
 Project: Hive
  Issue Type: New Feature
  Components: Security, Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6329.1.patch.txt, HIVE-6329.10.patch.txt, 
 HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, 
 HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, 
 HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt


 Receiving some requirements on encryption recently but hive is not supporting 
 it. Before the full implementation via HIVE-5207, this might be useful for 
 some cases.
 {noformat}
 hive create table encode_test(id int, name STRING, phone STRING, address 
 STRING) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
 STORED AS TEXTFILE;
 OK
 Time taken: 0.584 seconds
 hive insert into table encode_test select 
 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows);
 ..
 OK
 Time taken: 5.121 seconds
 hive select * from encode_test;
 OK
 100   navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
 Time taken: 0.078 seconds, Fetched: 1 row(s)
 hive 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-22 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106515#comment-14106515
 ] 

Xiaomeng Huang commented on HIVE-7730:
--

Thanks [~szehon] I have linked to review board.

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-21 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Attachment: (was: HIVE-7730.002.patch)

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-21 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Attachment: HIVE-7730.002.patch

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add column access information from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Summary: Extend ReadEntity to add column access information from query  
(was: Get instance of HiveSemanticAnalyzerHookContext from configuration)

 Extend ReadEntity to add column access information from query
 -

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).
 So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add column access information from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-


  was:
Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have 
hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).
So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.


 Extend ReadEntity to add column access information from query
 -

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Summary: Extend ReadEntity to add accessed columns from query  (was: Extend 
ReadEntity to add column access information from query)

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);

if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed columns to ReadEntity from columnAccessInfo
}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-



 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed columns to ReadEntity from columnAccessInfo
 }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed columns to ReadEntity from columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);

if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed columns to ReadEntity from columnAccessInfo
}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed columns to ReadEntity from columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed columns to ReadEntity from columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column map to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set  
HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set  
 HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column map to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS(or we can set another confvar for it) is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column map to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS(or we can set another confvar for it) is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set  
HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can set another confvar for it) is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column map to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when 
we set HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when 
 we set HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column map to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103568#comment-14103568
 ] 

Xiaomeng Huang commented on HIVE-7730:
--

Hi [~ashutoshc]
Currently Hive has a new interface for external authorization plugin and 
semantic hook may be replaced in the future. So I will try to put accessed 
columns to ReadEntity instread of enhancing semantic hook.This way will be 
available to hooks as well as authorization interfaces. I have updated the 
description, and wait for you feedback. Thanks!

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when 
 we set HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column map to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column list to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when 
we set HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column map to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS is true.
 Then we can get accessed columns when do authorization in compile before 
 execute.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
   // TODO: we can put accessed column list to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then external authorization model can get accessed columns when do 
authorization in compile before execute. Maybe we will remove columnAccessInfo 
from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get 
accessed columns from ReadEntity too.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
  // TODO: we can put accessed column list to ReadEntity getting from 
columnAccessInfo
}
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then we can get accessed columns when do authorization in compile before 
execute.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess());
  // TODO: we can put accessed column list to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
   // TODO: we can put accessed column list to ReadEntity getting from 
 columnAccessInfo
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then external authorization model can get accessed columns when do 
authorization in compile before execute. Maybe we will remove columnAccessInfo 
from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get 
accessed columns from ReadEntity too.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
}
compiler.compile(pCtx, rootTasks, inputs, outputs);
// TODO: 
// after compile, we can put accessed column list to ReadEntity getting 
from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then external authorization model can get accessed columns when do 
authorization in compile before execute. Maybe we will remove columnAccessInfo 
from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get 
accessed columns from ReadEntity too.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
  // TODO: we can put accessed column list to ReadEntity getting from 
columnAccessInfo
}
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, 

[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Description: 
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
Then external authorization model can get accessed columns when do 
authorization in compile before execute. Maybe we will remove columnAccessInfo 
from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get 
accessed columns from ReadEntity too.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
}
compiler.compile(pCtx, rootTasks, inputs, outputs);
// TODO: 
// after compile, we can put accessed column list to ReadEntity getting 
from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
{code}

  was:
-Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).-
-So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.-
Hive should store accessed columns to ReadEntity when we set 
HIVE_STATS_COLLECT_SCANCOLS is true.
Then external authorization model can get accessed columns when do 
authorization in compile before execute. Maybe we will remove columnAccessInfo 
from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get 
accessed columns from ReadEntity too.
Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
{code}   boolean isColumnInfoNeedForAuth = 
SessionState.get().isAuthorizationModeV2()
 HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
if (isColumnInfoNeedForAuth
|| HiveConf.getBoolVar(this.conf, 
HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
  ColumnAccessAnalyzer columnAccessAnalyzer = new 
ColumnAccessAnalyzer(pCtx);
  setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
}
compiler.compile(pCtx, rootTasks, inputs, outputs);
// TODO: 
// after compile, we can put accessed column list to ReadEntity getting 
from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
{code}


 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer 

[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query

2014-08-20 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Attachment: HIVE-7730.002.patch

This is the patch of extending ReadEntity with accessed columns list.

 Extend ReadEntity to add accessed columns from query
 

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch


 -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).-
 -So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.-
 Hive should store accessed columns to ReadEntity when we set 
 HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true.
 Then external authorization model can get accessed columns when do 
 authorization in compile before execute. Maybe we will remove 
 columnAccessInfo from BaseSemanticAnalyzer, old authorization and 
 AuthorizationModeV2 can get accessed columns from ReadEntity too.
 Here is the quick implement in SemanticAnalyzer.analyzeInternal() below:
 {code}   boolean isColumnInfoNeedForAuth = 
 SessionState.get().isAuthorizationModeV2()
  HiveConf.getBoolVar(conf, 
 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
 if (isColumnInfoNeedForAuth
 || HiveConf.getBoolVar(this.conf, 
 HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) {
   ColumnAccessAnalyzer columnAccessAnalyzer = new 
 ColumnAccessAnalyzer(pCtx);
   setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); 
 }
 compiler.compile(pCtx, rootTasks, inputs, outputs);
 // TODO: 
 // after compile, we can put accessed column list to ReadEntity getting 
 from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7730) Get instance of HiveSemanticAnalyzerHookContext from configuration

2014-08-14 Thread Xiaomeng Huang (JIRA)
Xiaomeng Huang created HIVE-7730:


 Summary: Get instance of HiveSemanticAnalyzerHookContext from 
configuration
 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang


Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have 
hook of HiveSemanticAnalyzerHook, we may want to get more things from 
hookContext. (e.g. the needed colums from query).
So we should get instance of HiveSemanticAnalyzerHookContext from 
configuration, extends HiveSemanticAnalyzerHookContext with a new 
implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
what you want to the class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7730) Get instance of HiveSemanticAnalyzerHookContext from configuration

2014-08-14 Thread Xiaomeng Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaomeng Huang updated HIVE-7730:
-

Attachment: HIVE-7730.001.patch

This is the patch and this feature blocks SENTRY-392.

 Get instance of HiveSemanticAnalyzerHookContext from configuration
 --

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).
 So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7730) Get instance of HiveSemanticAnalyzerHookContext from configuration

2014-08-14 Thread Xiaomeng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098046#comment-14098046
 ] 

Xiaomeng Huang commented on HIVE-7730:
--

Thanks [~ashutoshc] for the valuable comment!
I have seen CheckColumnAccessHook implements ExecuteWithHookContext, and it 
occurs in execute
But authorization should occurs in compile before execute.
So I am afraid this approach can not meet my requirement.

 Get instance of HiveSemanticAnalyzerHookContext from configuration
 --

 Key: HIVE-7730
 URL: https://issues.apache.org/jira/browse/HIVE-7730
 Project: Hive
  Issue Type: Bug
Reporter: Xiaomeng Huang
 Attachments: HIVE-7730.001.patch


 Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we 
 have hook of HiveSemanticAnalyzerHook, we may want to get more things from 
 hookContext. (e.g. the needed colums from query).
 So we should get instance of HiveSemanticAnalyzerHookContext from 
 configuration, extends HiveSemanticAnalyzerHookContext with a new 
 implementation, overide the HiveSemanticAnalyzerHookContext.update() and put 
 what you want to the class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)