[jira] [Created] (HIVE-16936) wrong result with CTAS(create table as select)
Xiaomeng Huang created HIVE-16936: - Summary: wrong result with CTAS(create table as select) Key: HIVE-16936 URL: https://issues.apache.org/jira/browse/HIVE-16936 Project: Hive Issue Type: Bug Affects Versions: 1.2.1 Reporter: Xiaomeng Huang Priority: Critical 1. {code} hive> select 'test' as did from abc_test_old > where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1; OK test {code} result is 'test' 2. {code} hive> create table abc_test_12345 as > select 'test' as did from abc_test_old > where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1; hive> select did from abc_test_12345 limit 1; OK 5FCAFD34-C124-4E13-AF65-27B675C945CC {code} result is '5FCAFD34-C124-4E13-AF65-27B675C945CC' why result is not 'test'? 3. {code} hive> explain > create table abc_test_12345 as > select 'test' as did from abc_test_old > where did = '5FCAFD34-C124-4E13-AF65-27B675C945CC' limit 1; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, Stage-4 Stage-3 Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 Stage-7 depends on stages: Stage-0 Stage-2 Stage-4 Stage-5 depends on stages: Stage-4 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: abc_test_old Statistics: Num rows: 32 Data size: 1152 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (did = '5FCAFD34-C124-4E13-AF65-27B675C945CC') (type: boolean) Statistics: Num rows: 16 Data size: 576 Basic stats: COMPLETE Column stats: NONE Select Operator Statistics: Num rows: 16 Data size: 576 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 1 Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Select Operator expressions: '5FCAFD34-C124-4E13-AF65-27B675C945CC' (type: string) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 1 Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: true Statistics: Num rows: 1 Data size: 36 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: default.abc_test_12345 .. {code} why expressions is '5FCAFD34-C124-4E13-AF65-27B675C945CC' -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-15836) CATS failed when the table is stored as orc and select cause has null
Xiaomeng Huang created HIVE-15836: - Summary: CATS failed when the table is stored as orc and select cause has null Key: HIVE-15836 URL: https://issues.apache.org/jira/browse/HIVE-15836 Project: Hive Issue Type: Bug Affects Versions: 1.2.1 Reporter: Xiaomeng Huang Based on the stable version 1.2.1, I patched https://issues.apache.org/jira/browse/HIVE-11217, but I still got error. CASE: {quote} CREATE TABLE empty (x int); CREATE TABLE orc_table_with_null STORED AS ORC AS SELECT x, null FROM empty; {quote} ERROR: {quote} FAILED: SemanticException [Error 10305]: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: _c1 {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-14369) submit a task with hive on spark to other yarn cluster failed
Xiaomeng Huang created HIVE-14369: - Summary: submit a task with hive on spark to other yarn cluster failed Key: HIVE-14369 URL: https://issues.apache.org/jira/browse/HIVE-14369 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang In our environment, we have two hadoop clusters with HA, named hivecluster and sparkcluster. hivecluster is a HA hadoop cluster for hive, which has large hard disk. sparkcluster is a HA hadoop cluster for spark, which has large memory. e.g. below is a hdfs-site.xml of hivecluster: {code} dfs.ha.namenodes.hivecluster nn1,nn2 dfs.namenode.rpc-address.hivecluster.nn1 10.17.21.32:9000 dfs.namenode.rpc-address.hivecluster.nn2 10.17.21.77:9000 dfs.namenode.http-address.hivecluster.nn1 10.17.21.32:50070 dfs.namenode.http-address.hivecluster.nn2 10.17.21.77:50070 {code} Firstly, I created a hive table located as hdfs://hivecluster/hive/warehouse/xxx If I use hive on mr, it will run successfully. But if I use hive on spark to submit a task to yarn cluster of sparkcluster, it says: {code} FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask yarn日志显示: Diagnostics: java.lang.IllegalArgumentException: java.net.UnknownHostException: hivecluster Failing this attempt. Failing the application. {code} I didn't set host of hivecluster into hdfs-site.xml of sparkcluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7934) Improve column level encryption with key management
[ https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240863#comment-14240863 ] Xiaomeng Huang commented on HIVE-7934: -- Hi [~chirag.aggarwal] I have updated my patch on HIVE-8049 using crypto codec and kms in hadoop-common. Improve column level encryption with key management --- Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: - Base64WriteOnly just be able to get the ciphertext from client for any users. - Base64Rewriter just be able to get plaintext from client for any users. I have an improvement based on HIVE-6329 using key management via kms. This patch implement transparent column level encryption. Users don't need to set anything when they quey tables. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.key.provider.path/name valuekms://http@localhost:16000/kms/value /property {code} # create an encrypted table {code} drop table student_column_encrypt; create table student_column_encrypt (s_key INT, s_name STRING, s_country STRING, s_age INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter') STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1'); insert overwrite table student_column_encrypt select s_key, s_name, s_country, s_age from student; select * from student_column_encrypt; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from student_column_encrypt; OK 0 Armon China 20 1 JackUSA 21 2 LucyEngland 22 3 LilyFrance 23 4 Yom Spain 24 Time taken: 0.759 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from student_column_encrypt; OK 0 Armon dqyb188=NULL 1 JackYJez NULL 2 LucycKqV1c8MTw==NULL 3 Lilyc7aT180H NULL 4 Yom ZrST0MA=NULL Time taken: 0.77 seconds, Fetched: 5 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8969) getChildPrivileges should in one transaction with revoke
[ https://issues.apache.org/jira/browse/HIVE-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang resolved HIVE-8969. -- Resolution: Invalid Sorry, I want to create this jira in Sentry project, but I made a mistake in Hive. Close this jira as Invalid. getChildPrivileges should in one transaction with revoke Key: HIVE-8969 URL: https://issues.apache.org/jira/browse/HIVE-8969 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8049) Transparent column level encryption using kms
[ https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8049: - Status: Patch Available (was: In Progress) Transparent column level encryption using kms - Key: HIVE-8049 URL: https://issues.apache.org/jira/browse/HIVE-8049 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8049.001.patch, HIVE-8049.002.patch This patch implement transparent column level encryption. Users don't need to set anything when they quey tables. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.key.provider.path/name valuekms://http@localhost:16000/kms/value /property {code} # create an encrypted table {code} drop table student_column_encrypt; create table student_column_encrypt (s_key INT, s_name STRING, s_country STRING, s_age INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter') STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1'); insert overwrite table student_column_encrypt select s_key, s_name, s_country, s_age from student; select * from student_column_encrypt; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from student_column_encrypt; OK 0 Armon China 20 1 JackUSA 21 2 LucyEngland 22 3 LilyFrance 23 4 Yom Spain 24 Time taken: 0.759 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from student_column_encrypt; OK 0 Armon dqyb188=NULL 1 JackYJez NULL 2 LucycKqV1c8MTw==NULL 3 Lilyc7aT180H NULL 4 Yom ZrST0MA=NULL Time taken: 0.77 seconds, Fetched: 5 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8049) Transparent column level encryption using kms
[ https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8049: - Description: This patch implement transparent column level encryption. Users don't need to set anything when they quey tables. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.key.provider.path/name valuekms://http@localhost:16000/kms/value /property {code} # create an encrypted table {code} drop table student_column_encrypt; create table student_column_encrypt (s_key INT, s_name STRING, s_country STRING, s_age INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter') STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1'); insert overwrite table student_column_encrypt select s_key, s_name, s_country, s_age from student; select * from student_column_encrypt; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from student_column_encrypt; OK 0 Armon China 20 1 JackUSA 21 2 LucyEngland 22 3 LilyFrance 23 4 Yom Spain 24 Time taken: 0.759 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from student_column_encrypt; OK 0 Armon dqyb188=NULL 1 JackYJez NULL 2 LucycKqV1c8MTw==NULL 3 Lilyc7aT180H NULL 4 Yom ZrST0MA=NULL Time taken: 0.77 seconds, Fetched: 5 row(s) {code} was: This patch implement transparent column level encryption. Users don't need to set anything when they quey tables. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.kms.uri/name valuehttp://localhost:16000/kms/value /property {code} # create an encrypted table {code} -- region-aes-column.q drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1); insert overwrite table region_aes_column select r_regionkey, r_name from region; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.9 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user1 [user1@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.899 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 RcQycWVD 1 Rc8lam9Bxg== 2 RdEpeQ== 3 Qdcyd3ZH 4 ScskfGpHp8KIIuY= Time taken: 0.749 seconds, Fetched: 5 row(s) {code} Transparent column level encryption using kms - Key: HIVE-8049 URL: https://issues.apache.org/jira/browse/HIVE-8049 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8049.001.patch This patch implement transparent column level encryption. Users don't need to set anything when they quey tables. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.key.provider.path/name valuekms://http@localhost:16000/kms/value /property {code} # create an encrypted table {code} drop table student_column_encrypt; create table
[jira] [Updated] (HIVE-8049) Transparent column level encryption using kms
[ https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8049: - Attachment: HIVE-8049.002.patch Refactor the patch according to crypto codec in hadoop-common. Transparent column level encryption using kms - Key: HIVE-8049 URL: https://issues.apache.org/jira/browse/HIVE-8049 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8049.001.patch, HIVE-8049.002.patch This patch implement transparent column level encryption. Users don't need to set anything when they quey tables. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.key.provider.path/name valuekms://http@localhost:16000/kms/value /property {code} # create an encrypted table {code} drop table student_column_encrypt; create table student_column_encrypt (s_key INT, s_name STRING, s_country STRING, s_age INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter') STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1'); insert overwrite table student_column_encrypt select s_key, s_name, s_country, s_age from student; select * from student_column_encrypt; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from student_column_encrypt; OK 0 Armon China 20 1 JackUSA 21 2 LucyEngland 22 3 LilyFrance 23 4 Yom Spain 24 Time taken: 0.759 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from student_column_encrypt; OK 0 Armon dqyb188=NULL 1 JackYJez NULL 2 LucycKqV1c8MTw==NULL 3 Lilyc7aT180H NULL 4 Yom ZrST0MA=NULL Time taken: 0.77 seconds, Fetched: 5 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8416) Generic key management framework
[ https://issues.apache.org/jira/browse/HIVE-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8416: - Attachment: HIVE-8416.001.patch Generic key management framework Key: HIVE-8416 URL: https://issues.apache.org/jira/browse/HIVE-8416 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8416.001.patch In this patch, it will includes the interfaces of keyProvider, and the default implementation using java KeyStore -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7934) Improve column level encryption with key management
[ https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7934: - Description: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: - Base64WriteOnly just be able to get the ciphertext from client for any users. - Base64Rewriter just be able to get plaintext from client for any users. I have an improvement based on HIVE-6329 using key management via kms. This patch implement transparent column level encryption. Users don't need to set anything when they quey tables. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.key.provider.path/name valuekms://http@localhost:16000/kms/value /property {code} # create an encrypted table {code} drop table student_column_encrypt; create table student_column_encrypt (s_key INT, s_name STRING, s_country STRING, s_age INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter') STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1'); insert overwrite table student_column_encrypt select s_key, s_name, s_country, s_age from student; select * from student_column_encrypt; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from student_column_encrypt; OK 0 Armon China 20 1 JackUSA 21 2 LucyEngland 22 3 LilyFrance 23 4 Yom Spain 24 Time taken: 0.759 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from student_column_encrypt; OK 0 Armon dqyb188=NULL 1 JackYJez NULL 2 LucycKqV1c8MTw==NULL 3 Lilyc7aT180H NULL 4 Yom ZrST0MA=NULL Time taken: 0.77 seconds, Fetched: 5 row(s) {code} was: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: - Base64WriteOnly just be able to get the ciphertext from client for any users. - Base64Rewriter just be able to get plaintext from client for any users. I have an improvement based on HIVE-6329 using key management via kms. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.kms.uri/name valuehttp://localhost:16000/kms/value /property {code} # create an encrypted table {code} -- region-aes-column.q drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1); insert overwrite table region_aes_column select r_regionkey, r_name from region; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.9 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user1 [user1@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.899 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 RcQycWVD 1 Rc8lam9Bxg== 2 RdEpeQ== 3 Qdcyd3ZH 4 ScskfGpHp8KIIuY= Time taken: 0.749 seconds, Fetched: 5 row(s) {code} Improve column level encryption with key management --- Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor
[jira] [Updated] (HIVE-8252) Generic cryptographic codec
[ https://issues.apache.org/jira/browse/HIVE-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8252: - Resolution: Duplicate Status: Resolved (was: Patch Available) This jira is much same with the crypto codec in hadoop-common. As hadoop 2.6.0 is released, so mark this jira as duplicated. Generic cryptographic codec --- Key: HIVE-8252 URL: https://issues.apache.org/jira/browse/HIVE-8252 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8252.001.patch, HIVE-8252.002.patch In this patch, it will includes interfaces or abstract classes of generic Key, CryptoCodec, Encryptor and Decryptor. And the JCE AES implementation of the interfaces and abstract classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8416) Generic key management framework
[ https://issues.apache.org/jira/browse/HIVE-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang resolved HIVE-8416. -- Resolution: Duplicate As KMS feature has released in hadoop 2.6.0, so mark this jira as duplicated. Generic key management framework Key: HIVE-8416 URL: https://issues.apache.org/jira/browse/HIVE-8416 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8416.001.patch In this patch, it will includes the interfaces of keyProvider, and the default implementation using java KeyStore -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7934) Improve column level encryption with key management
[ https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237779#comment-14237779 ] Xiaomeng Huang commented on HIVE-7934: -- Hi [~chirag.aggarwal] When I worked on this feature, KMS feature is not released in hadoop 2.5.X. So I decided to write a generic crypto codec and key management in Hive. But talked with some committer who familiar with Security in hadoop-common, it is a few duplicated with the crypto codec in hadoop-common. I just have a look at hadoop release notes, hadoop 2.6.0 seems like include KMS feature. HIVE-8049 is the initial patch to implement Hive Column Level Encrpytion based on KMS in hadoop-common. And HIVE-8252 and HIVE-8416 will be closed as duplicated. I will update the patch of HIVE-8049 these days. Thanks for watching! Improve column level encryption with key management --- Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: - Base64WriteOnly just be able to get the ciphertext from client for any users. - Base64Rewriter just be able to get plaintext from client for any users. I have an improvement based on HIVE-6329 using key management via kms. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.kms.uri/name valuehttp://localhost:16000/kms/value /property {code} # create an encrypted table {code} -- region-aes-column.q drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1); insert overwrite table region_aes_column select r_regionkey, r_name from region; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.9 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user1 [user1@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.899 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 RcQycWVD 1 Rc8lam9Bxg== 2 RdEpeQ== 3 Qdcyd3ZH 4 ScskfGpHp8KIIuY= Time taken: 0.749 seconds, Fetched: 5 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work stopped] (HIVE-8416) Generic key management framework
[ https://issues.apache.org/jira/browse/HIVE-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-8416 stopped by Xiaomeng Huang. Generic key management framework Key: HIVE-8416 URL: https://issues.apache.org/jira/browse/HIVE-8416 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang In this patch, it will includes the interfaces of keyProvider, and the default implementation using java KeyStore -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8969) getChildPrivileges should in one transaction with revoke
Xiaomeng Huang created HIVE-8969: Summary: getChildPrivileges should in one transaction with revoke Key: HIVE-8969 URL: https://issues.apache.org/jira/browse/HIVE-8969 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6667) Need support for show tables authorization
[ https://issues.apache.org/jira/browse/HIVE-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176727#comment-14176727 ] Xiaomeng Huang commented on HIVE-6667: -- I don't see HiveDriverFilterHook in apache hive, did I miss anything? Need support for show tables authorization Key: HIVE-6667 URL: https://issues.apache.org/jira/browse/HIVE-6667 Project: Hive Issue Type: Improvement Components: Authorization Reporter: Alex Nastetsky Attachments: HIVE-6667.patch Need the ability to restrict access to show tables on a per database basis or globally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-8416) Generic key management framework
[ https://issues.apache.org/jira/browse/HIVE-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-8416 started by Xiaomeng Huang. Generic key management framework Key: HIVE-8416 URL: https://issues.apache.org/jira/browse/HIVE-8416 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang In this patch, it will includes the interfaces of keyProvider, and the default implementation using java KeyStore -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8252) Generic cryptographic codec
[ https://issues.apache.org/jira/browse/HIVE-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8252: - Summary: Generic cryptographic codec (was: Generic cryptographic codec and key management framework) Generic cryptographic codec --- Key: HIVE-8252 URL: https://issues.apache.org/jira/browse/HIVE-8252 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8252) Generic cryptographic codec
[ https://issues.apache.org/jira/browse/HIVE-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8252: - Description: In this patch, it will includes interfaces or abstract classes of generic Key, CryptoCodec, Encryptor and Decryptor. And the JCE AES implementation of the interfaces and abstract classes. Generic cryptographic codec --- Key: HIVE-8252 URL: https://issues.apache.org/jira/browse/HIVE-8252 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang In this patch, it will includes interfaces or abstract classes of generic Key, CryptoCodec, Encryptor and Decryptor. And the JCE AES implementation of the interfaces and abstract classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8416) Generic key management framework
Xiaomeng Huang created HIVE-8416: Summary: Generic key management framework Key: HIVE-8416 URL: https://issues.apache.org/jira/browse/HIVE-8416 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8252) Generic cryptographic codec
[ https://issues.apache.org/jira/browse/HIVE-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8252: - Attachment: HIVE-8252.001.patch Generic cryptographic codec --- Key: HIVE-8252 URL: https://issues.apache.org/jira/browse/HIVE-8252 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8252.001.patch In this patch, it will includes interfaces or abstract classes of generic Key, CryptoCodec, Encryptor and Decryptor. And the JCE AES implementation of the interfaces and abstract classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8252) Generic cryptographic codec
[ https://issues.apache.org/jira/browse/HIVE-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8252: - Status: Patch Available (was: Open) Generic cryptographic codec --- Key: HIVE-8252 URL: https://issues.apache.org/jira/browse/HIVE-8252 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8252.001.patch In this patch, it will includes interfaces or abstract classes of generic Key, CryptoCodec, Encryptor and Decryptor. And the JCE AES implementation of the interfaces and abstract classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8416) Generic key management framework
[ https://issues.apache.org/jira/browse/HIVE-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8416: - Description: In this patch, it will includes the interfaces of keyProvider, and the default implementation using java KeyStore Generic key management framework Key: HIVE-8416 URL: https://issues.apache.org/jira/browse/HIVE-8416 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang In this patch, it will includes the interfaces of keyProvider, and the default implementation using java KeyStore -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8252) Generic cryptographic codec
[ https://issues.apache.org/jira/browse/HIVE-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8252: - Attachment: HIVE-8252.002.patch Generic cryptographic codec --- Key: HIVE-8252 URL: https://issues.apache.org/jira/browse/HIVE-8252 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8252.001.patch, HIVE-8252.002.patch In this patch, it will includes interfaces or abstract classes of generic Key, CryptoCodec, Encryptor and Decryptor. And the JCE AES implementation of the interfaces and abstract classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8252) Generic cryptographic codec and key management framework
Xiaomeng Huang created HIVE-8252: Summary: Generic cryptographic codec and key management framework Key: HIVE-8252 URL: https://issues.apache.org/jira/browse/HIVE-8252 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8049) Transparent column level encryption using kms
[ https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8049: - Summary: Transparent column level encryption using kms (was: Transparent column level encryption using key management) Transparent column level encryption using kms - Key: HIVE-8049 URL: https://issues.apache.org/jira/browse/HIVE-8049 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8049.001.patch This patch implement transparent column level encryption. Users don't need to set anything when they quey tables. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.kms.uri/name valuehttp://localhost:16000/kms/value /property {code} # create an encrypted table {code} -- region-aes-column.q drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1); insert overwrite table region_aes_column select r_regionkey, r_name from region; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.9 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user1 [user1@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.899 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 RcQycWVD 1 Rc8lam9Bxg== 2 RdEpeQ== 3 Qdcyd3ZH 4 ScskfGpHp8KIIuY= Time taken: 0.749 seconds, Fetched: 5 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7932: - Attachment: HIVE-7932.002.patch Add a testcase to test accessed columns from ReadEntity. It may cause NP exception when add accessed columns to ReadEntity - Key: HIVE-7932 URL: https://issues.apache.org/jira/browse/HIVE-7932 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-7932.001.patch, HIVE-7932.002.patch {code} case TABLE: entity.getAccessedColumns().addAll( tableToColumnAccessMap.get(entity.getTable().getCompleteName())); {code} if tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, addAll(null) will throw null pointer exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7932: - Attachment: HIVE-7932.002.patch It may cause NP exception when add accessed columns to ReadEntity - Key: HIVE-7932 URL: https://issues.apache.org/jira/browse/HIVE-7932 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-7932.001.patch, HIVE-7932.002.patch {code} case TABLE: entity.getAccessedColumns().addAll( tableToColumnAccessMap.get(entity.getTable().getCompleteName())); {code} if tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, addAll(null) will throw null pointer exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7932: - Attachment: (was: HIVE-7932.002.patch) It may cause NP exception when add accessed columns to ReadEntity - Key: HIVE-7932 URL: https://issues.apache.org/jira/browse/HIVE-7932 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-7932.001.patch, HIVE-7932.002.patch {code} case TABLE: entity.getAccessedColumns().addAll( tableToColumnAccessMap.get(entity.getTable().getCompleteName())); {code} if tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, addAll(null) will throw null pointer exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134817#comment-14134817 ] Xiaomeng Huang commented on HIVE-7932: -- This test failure is not caused by my patch, and I can pass it in my local machine. It may cause NP exception when add accessed columns to ReadEntity - Key: HIVE-7932 URL: https://issues.apache.org/jira/browse/HIVE-7932 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-7932.001.patch, HIVE-7932.002.patch {code} case TABLE: entity.getAccessedColumns().addAll( tableToColumnAccessMap.get(entity.getTable().getCompleteName())); {code} if tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, addAll(null) will throw null pointer exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7934) Improve column level encryption with key management
[ https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7934: - Description: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: - Base64WriteOnly just be able to get the ciphertext from client for any users. - Base64Rewriter just be able to get plaintext from client for any users. I have an improvement based on HIVE-6329 using key management via kms. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # create an encrypted table {code} -- region-aes-column.q drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1); insert overwrite table region_aes_column select r_regionkey, r_name from region; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.9 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user1 [user1@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.899 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 RcQycWVD 1 Rc8lam9Bxg== 2 RdEpeQ== 3 Qdcyd3ZH 4 ScskfGpHp8KIIuY= Time taken: 0.749 seconds, Fetched: 5 row(s) {code} was: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based on HIVE-6329 using key management via kms. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # create an encrypted table {code} -- region-aes-column.q drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1); insert overwrite table region_aes_column select r_regionkey, r_name from region; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.9 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user1 [user1@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.899 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 RcQycWVD 1 Rc8lam9Bxg== 2 RdEpeQ== 3 Qdcyd3ZH 4 ScskfGpHp8KIIuY= Time taken: 0.749 seconds, Fetched: 5 row(s) {code} Improve column level encryption with key management --- Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: - Base64WriteOnly just be able to get the ciphertext from client for any users. - Base64Rewriter just be able to get plaintext from client for any users. I have an improvement based on HIVE-6329 using key management via kms. # setup kms and set kms-acls.xml (e.g. user1 and root
[jira] [Updated] (HIVE-7934) Improve column level encryption with key management
[ https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7934: - Description: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: - Base64WriteOnly just be able to get the ciphertext from client for any users. - Base64Rewriter just be able to get plaintext from client for any users. I have an improvement based on HIVE-6329 using key management via kms. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.kms.uri/name valuehttp://localhost:16000/kms/value /property {code} # create an encrypted table {code} -- region-aes-column.q drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1); insert overwrite table region_aes_column select r_regionkey, r_name from region; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.9 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user1 [user1@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.899 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 RcQycWVD 1 Rc8lam9Bxg== 2 RdEpeQ== 3 Qdcyd3ZH 4 ScskfGpHp8KIIuY= Time taken: 0.749 seconds, Fetched: 5 row(s) {code} was: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: - Base64WriteOnly just be able to get the ciphertext from client for any users. - Base64Rewriter just be able to get plaintext from client for any users. I have an improvement based on HIVE-6329 using key management via kms. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # create an encrypted table {code} -- region-aes-column.q drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1); insert overwrite table region_aes_column select r_regionkey, r_name from region; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.9 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user1 [user1@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.899 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 RcQycWVD 1 Rc8lam9Bxg== 2 RdEpeQ== 3 Qdcyd3ZH 4 ScskfGpHp8KIIuY= Time taken: 0.749 seconds, Fetched: 5 row(s) {code} Improve column level encryption with key management --- Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: - Base64WriteOnly just be able to get the ciphertext from client for any users. - Base64Rewriter just be able to get
[jira] [Commented] (HIVE-8049) Transparent column level encryption using key management
[ https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129787#comment-14129787 ] Xiaomeng Huang commented on HIVE-8049: -- Initial patch based on kms simple mode Transparent column level encryption using key management Key: HIVE-8049 URL: https://issues.apache.org/jira/browse/HIVE-8049 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8049.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8049) Transparent column level encryption using key management
[ https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8049: - Description: This patch implement transparent column level encryption. Users don't need to set anything when they quey tables. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.kms.uri/name valuehttp://localhost:16000/kms/value /property {code} # create an encrypted table {code} -- region-aes-column.q drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1); insert overwrite table region_aes_column select r_regionkey, r_name from region; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.9 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user1 [user1@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.899 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 RcQycWVD 1 Rc8lam9Bxg== 2 RdEpeQ== 3 Qdcyd3ZH 4 ScskfGpHp8KIIuY= Time taken: 0.749 seconds, Fetched: 5 row(s) {code} Transparent column level encryption using key management Key: HIVE-8049 URL: https://issues.apache.org/jira/browse/HIVE-8049 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8049.001.patch This patch implement transparent column level encryption. Users don't need to set anything when they quey tables. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # set hive-site.xml {code} property namehadoop.security.kms.uri/name valuehttp://localhost:16000/kms/value /property {code} # create an encrypted table {code} -- region-aes-column.q drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1); insert overwrite table region_aes_column select r_regionkey, r_name from region; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.9 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user1 [user1@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.899 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 RcQycWVD 1 Rc8lam9Bxg== 2 RdEpeQ== 3 Qdcyd3ZH 4 ScskfGpHp8KIIuY= Time taken: 0.749 seconds, Fetched: 5 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8049) Transparent column level encryption using key management
Xiaomeng Huang created HIVE-8049: Summary: Transparent column level encryption using key management Key: HIVE-8049 URL: https://issues.apache.org/jira/browse/HIVE-8049 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8050) Using master key to protect data key
Xiaomeng Huang created HIVE-8050: Summary: Using master key to protect data key Key: HIVE-8050 URL: https://issues.apache.org/jira/browse/HIVE-8050 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8049) Transparent column level encryption using key management
[ https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-8049: - Attachment: HIVE-8049.001.patch Transparent column level encryption using key management Key: HIVE-8049 URL: https://issues.apache.org/jira/browse/HIVE-8049 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8049.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-8049) Transparent column level encryption using key management
[ https://issues.apache.org/jira/browse/HIVE-8049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-8049 started by Xiaomeng Huang. Transparent column level encryption using key management Key: HIVE-8049 URL: https://issues.apache.org/jira/browse/HIVE-8049 Project: Hive Issue Type: Sub-task Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-8049.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7934) Improve column level encryption with key management
[ https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7934: - Description: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based on HIVE-6329 using key management via kms. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # create an encrypted table {code} -- region-aes-column.q drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1); insert overwrite table region_aes_column select r_regionkey, r_name from region; {code} # query table by different user, this is transparent to users. It is very convenient and don't need to set anything. {code} [root@huang1 hive_data]# hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.9 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user1 [user1@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.899 seconds, Fetched: 5 row(s) [root@huang1 hive_data]# su user2 [user2@huang1 hive_data]$ hive hive select * from region_aes_column; OK 0 RcQycWVD 1 Rc8lam9Bxg== 2 RdEpeQ== 3 Qdcyd3ZH 4 ScskfGpHp8KIIuY= Time taken: 0.749 seconds, Fetched: 5 row(s) {code} was: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based on HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} Improve column level encryption with key management --- Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based on HIVE-6329 using key management via kms. # setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key) {code} property namehadoop.kms.acl.GET/name valueuser1 root/value description ACL for get-key-version and get-current-key operations. /description /property {code} # create an encrypted table {code} -- region-aes-column.q drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE TBLPROPERTIES(hive.encrypt.keynames=hive.k1); insert overwrite table region_aes_column
[jira] [Work started] (HIVE-7934) Improve column level encryption with key management
[ https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-7934 started by Xiaomeng Huang. Improve column level encryption with key management --- Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based on HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7934) Improve column level encryption with key management
[ https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7934: - Description: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based on HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} was: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} Improve column level encryption with key management --- Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based on HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang resolved HIVE-7730. -- Resolution: Fixed Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Fix For: 0.14.0 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, HIVE-7730.003.patch, HIVE-7730.004.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Attachment: (was: HIVE-7730-fix-NP-issue.patch) Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Fix For: 0.14.0 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, HIVE-7730.003.patch, HIVE-7730.004.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity
Xiaomeng Huang created HIVE-7932: Summary: It may cause NP exception when add accessed columns to ReadEntity Key: HIVE-7932 URL: https://issues.apache.org/jira/browse/HIVE-7932 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7932: - Description: {code} case TABLE: entity.getAccessedColumns().addAll( tableToColumnAccessMap.get(entity.getTable().getCompleteName())); {code} if tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, addAll(null) will throw null pointer exception. It may cause NP exception when add accessed columns to ReadEntity - Key: HIVE-7932 URL: https://issues.apache.org/jira/browse/HIVE-7932 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang {code} case TABLE: entity.getAccessedColumns().addAll( tableToColumnAccessMap.get(entity.getTable().getCompleteName())); {code} if tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, addAll(null) will throw null pointer exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7932) It may cause NP exception when add accessed columns to ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7932: - Attachment: HIVE-7932.001.patch It may cause NP exception when add accessed columns to ReadEntity - Key: HIVE-7932 URL: https://issues.apache.org/jira/browse/HIVE-7932 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Attachments: HIVE-7932.001.patch {code} case TABLE: entity.getAccessedColumns().addAll( tableToColumnAccessMap.get(entity.getTable().getCompleteName())); {code} if tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, addAll(null) will throw null pointer exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7934) Improve column level encryption with key management
Xiaomeng Huang created HIVE-7934: Summary: Improve column level encryption with key management Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7934) Improve column level encryption with key management
[ https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7934: - Description: Now Improve column level encryption with key management --- Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor Now -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7934) Improve column level encryption with key management
[ https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7934: - Description: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems. Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter', 'column.encode.key'='123456789', 'column.encode.iv'='123456') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} was:Now Improve column level encryption with key management --- Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems. Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter', 'column.encode.key'='123456789', 'column.encode.iv'='123456') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7934) Improve column level encryption with key management
[ https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7934: - Description: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter', 'column.encode.key'='123456789', 'column.encode.iv'='123456') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} was: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems. Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter', 'column.encode.key'='123456789', 'column.encode.iv'='123456') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} Improve column level encryption with key management --- Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter', 'column.encode.key'='123456789', 'column.encode.iv'='123456') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7934) Improve column level encryption with key management
[ https://issues.apache.org/jira/browse/HIVE-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7934: - Description: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} was: Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter', 'column.encode.key'='123456789', 'column.encode.iv'='123456') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} Improve column level encryption with key management --- Key: HIVE-7934 URL: https://issues.apache.org/jira/browse/HIVE-7934 Project: Hive Issue Type: Improvement Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Priority: Minor Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems: Base64WriteOnly can just get the ciphertext from client for any users. And Base64Rewriter can just get plaintext from client for any users. I have an improvement based HIVE-7934 using key management. {code} -- region-aes-column.q set hive.encrypt.key=123456789; set hive.encrypt.iv=123456; drop table region_aes_column; create table region_aes_column (r_regionkey int, r_name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='r_name', 'column.encode.classname'='org.apache.hadoop.hive.serde2.aes.AESRewriter') STORED AS TEXTFILE; insert overwrite table region_aes_column select r_regionkey, r_name from region; hive select * from region_aes_column; OK 0 /q5RTO1X 1 /qVGV+dV3g== 2 /rtKRA== 3 +r1RSv5T 4 8qFHQeJTvxWUadw= Time taken: 0.666 seconds, Fetched: 5 row(s) hive set hive.encrypt.key=123456789; hive set hive.encrypt.iv=123456; hive select * from region_aes_column; OK 0 AFRICA 1 AMERICA 2 ASIA 3 EUROPE 4 MIDDLE EAST Time taken: 0.714 seconds, Fetched: 5 row(s) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang reopened HIVE-7730: -- Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Fix For: 0.14.0 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, HIVE-7730.003.patch, HIVE-7730.004.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Attachment: HIVE-7730-fix-NP-issue.patch Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Fix For: 0.14.0 Attachments: HIVE-7730-fix-NP-issue.patch, HIVE-7730.001.patch, HIVE-7730.002.patch, HIVE-7730.003.patch, HIVE-7730.004.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117038#comment-14117038 ] Xiaomeng Huang commented on HIVE-7730: -- Hi [~szehon] There is a null pointer issue in latest patch. entity.getAccessedColumns().addAll(tableToColumnAccessMap.get(entity.getTable().getCompleteName())); if tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, addAll(null) will throw null pointer exception. I attached a patch to fix it, could you help to review it? Thanks! Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Fix For: 0.14.0 Attachments: HIVE-7730-fix-NP-issue.patch, HIVE-7730.001.patch, HIVE-7730.002.patch, HIVE-7730.003.patch, HIVE-7730.004.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111999#comment-14111999 ] Xiaomeng Huang commented on HIVE-6329: -- Hi, Navis I agree with your patch is a framework of column level encryption/decryption. I am curious if you use Base64WriteOnly to encode your values, then how to get the plaintext? And now Base64Rewriter just get the plaintext instread of ciphertext from client, right? I have an idea to improve it: we use keymanagement to do encode/decode in Rewriter. And the path of key in local will set to configuration instead of SERDEPROPERTIES. User1 use the key1 to encode values when instert data and the values of these colums will be encoded in HDFS. User2 want to scan the tables, if he has key1, he can decode the value successfully and get the plaintext. Otherwise, if he has no key or a wrong key, he will decode failed and just get the ciphertext. If this approach make sense to you. I want to create a jira to improve it via keymanagement based on this jira. Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.10.patch.txt, HIVE-6329.11.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113376#comment-14113376 ] Xiaomeng Huang commented on HIVE-7730: -- Thanks [~szehon]! Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Fix For: 0.14.0 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, HIVE-7730.003.patch, HIVE-7730.004.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Status: Patch Available (was: Open) Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, HIVE-7730.003.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Attachment: HIVE-7730.004.patch Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, HIVE-7730.003.patch, HIVE-7730.004.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Attachment: HIVE-7730.003.patch Fixed something from [~szehon] Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, HIVE-7730.003.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110217#comment-14110217 ] Xiaomeng Huang commented on HIVE-6329: -- Hi, Navis I am very interested in this feature! But there are some build failed issues from your lastest patch, could you help to rebase it? Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.10.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106515#comment-14106515 ] Xiaomeng Huang commented on HIVE-7730: -- Thanks [~szehon] I have linked to review board. Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Attachment: (was: HIVE-7730.002.patch) Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Attachment: HIVE-7730.002.patch Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add column access information from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Summary: Extend ReadEntity to add column access information from query (was: Get instance of HiveSemanticAnalyzerHookContext from configuration) Extend ReadEntity to add column access information from query - Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query). So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add column access information from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- was: Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query). So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class. Extend ReadEntity to add column access information from query - Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Summary: Extend ReadEntity to add accessed columns from query (was: Extend ReadEntity to add column access information from query) Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed columns to ReadEntity from columnAccessInfo } was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed columns to ReadEntity from columnAccessInfo } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed columns to ReadEntity from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed columns to ReadEntity from columnAccessInfo } Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed columns to ReadEntity from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed columns to ReadEntity from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_AUTHORIZATION_ENABLED or HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can set another confvar for it) is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can set another confvar for it) is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can set another confvar for it) is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103568#comment-14103568 ] Xiaomeng Huang commented on HIVE-7730: -- Hi [~ashutoshc] Currently Hive has a new interface for external authorization plugin and semantic hook may be replaced in the future. So I will try to put accessed columns to ReadEntity instread of enhancing semantic hook.This way will be available to hooks as well as authorization interfaces. I have updated the description, and wait for you feedback. Thanks! Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column list to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns map or ColumnAccessInfo to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column map to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column list to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column list to ReadEntity getting from columnAccessInfo } {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then we can get accessed columns when do authorization in compile before execute. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column list to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column list to ReadEntity getting from columnAccessInfo } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); // TODO: we can put accessed column list to ReadEntity getting from columnAccessInfo } {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs,
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Description: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} was: -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Attachment: HIVE-7730.002.patch This is the patch of extending ReadEntity with accessed columns list. Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7730) Get instance of HiveSemanticAnalyzerHookContext from configuration
Xiaomeng Huang created HIVE-7730: Summary: Get instance of HiveSemanticAnalyzerHookContext from configuration Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query). So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7730) Get instance of HiveSemanticAnalyzerHookContext from configuration
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Attachment: HIVE-7730.001.patch This is the patch and this feature blocks SENTRY-392. Get instance of HiveSemanticAnalyzerHookContext from configuration -- Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query). So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7730) Get instance of HiveSemanticAnalyzerHookContext from configuration
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098046#comment-14098046 ] Xiaomeng Huang commented on HIVE-7730: -- Thanks [~ashutoshc] for the valuable comment! I have seen CheckColumnAccessHook implements ExecuteWithHookContext, and it occurs in execute But authorization should occurs in compile before execute. So I am afraid this approach can not meet my requirement. Get instance of HiveSemanticAnalyzerHookContext from configuration -- Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Attachments: HIVE-7730.001.patch Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query). So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class. -- This message was sent by Atlassian JIRA (v6.2#6252)