[jira] [Commented] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements
[ https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973760#comment-14973760 ] Chengbing Liu commented on HIVE-11901: -- Thanks [~thejas] for review and committing! > StorageBasedAuthorizationProvider requires write permission on table for > SELECT statements > -- > > Key: HIVE-11901 > URL: https://issues.apache.org/jira/browse/HIVE-11901 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 1.2.1 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11901.01.patch, HIVE-11901.02.patch, > HIVE-11901.03.patch > > > With HIVE-7895, it will require write permission on the table directory even > for a SELECT statement. > Looking at the stacktrace, it seems the method > {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, > Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats > a null partition as a CREATE statement, which can also be a SELECT. > We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first > in order to tell which statement it is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements
[ https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968361#comment-14968361 ] Chengbing Liu commented on HIVE-11901: -- Failed tests are not related. > StorageBasedAuthorizationProvider requires write permission on table for > SELECT statements > -- > > Key: HIVE-11901 > URL: https://issues.apache.org/jira/browse/HIVE-11901 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 1.2.1 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: HIVE-11901.01.patch, HIVE-11901.02.patch > > > With HIVE-7895, it will require write permission on the table directory even > for a SELECT statement. > Looking at the stacktrace, it seems the method > {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, > Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats > a null partition as a CREATE statement, which can also be a SELECT. > We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first > in order to tell which statement it is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements
[ https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated HIVE-11901: - Attachment: HIVE-11901.03.patch Patch updated. > StorageBasedAuthorizationProvider requires write permission on table for > SELECT statements > -- > > Key: HIVE-11901 > URL: https://issues.apache.org/jira/browse/HIVE-11901 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 1.2.1 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: HIVE-11901.01.patch, HIVE-11901.02.patch, > HIVE-11901.03.patch > > > With HIVE-7895, it will require write permission on the table directory even > for a SELECT statement. > Looking at the stacktrace, it seems the method > {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, > Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats > a null partition as a CREATE statement, which can also be a SELECT. > We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first > in order to tell which statement it is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements
[ https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated HIVE-11901: - Attachment: HIVE-11901.02.patch > StorageBasedAuthorizationProvider requires write permission on table for > SELECT statements > -- > > Key: HIVE-11901 > URL: https://issues.apache.org/jira/browse/HIVE-11901 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 1.2.1 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: HIVE-11901.01.patch, HIVE-11901.02.patch > > > With HIVE-7895, it will require write permission on the table directory even > for a SELECT statement. > Looking at the stacktrace, it seems the method > {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, > Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats > a null partition as a CREATE statement, which can also be a SELECT. > We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first > in order to tell which statement it is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements
[ https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966486#comment-14966486 ] Chengbing Liu commented on HIVE-11901: -- [~thejas], thanks for the hint. I wasn't aware of itests back then... Uploaded the fix with tests updated. > StorageBasedAuthorizationProvider requires write permission on table for > SELECT statements > -- > > Key: HIVE-11901 > URL: https://issues.apache.org/jira/browse/HIVE-11901 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 1.2.1 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: HIVE-11901.01.patch, HIVE-11901.02.patch > > > With HIVE-7895, it will require write permission on the table directory even > for a SELECT statement. > Looking at the stacktrace, it seems the method > {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, > Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats > a null partition as a CREATE statement, which can also be a SELECT. > We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first > in order to tell which statement it is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11149) Fix issue with sometimes HashMap in PerfLogger.java hangs
[ https://issues.apache.org/jira/browse/HIVE-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949728#comment-14949728 ] Chengbing Liu commented on HIVE-11149: -- Thanks [~sershe] for review and committing! > Fix issue with sometimes HashMap in PerfLogger.java hangs > -- > > Key: HIVE-11149 > URL: https://issues.apache.org/jira/browse/HIVE-11149 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 1.2.1 >Reporter: WangMeng >Assignee: WangMeng > Fix For: 2.0.0 > > Attachments: HIVE-11149.01.patch, HIVE-11149.02.patch, > HIVE-11149.03.patch, HIVE-11149.04.patch > > > In Multi-thread environment, sometimes the HashMap in PerfLogger.java > will casue massive Java Processes hang and cost large amounts of > unnecessary CPU and Memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements
[ https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948018#comment-14948018 ] Chengbing Liu commented on HIVE-11901: -- [~thejas], I think we can add test cases for the authorization part in another JIRA and check this in first, if you think the patch is ok. > StorageBasedAuthorizationProvider requires write permission on table for > SELECT statements > -- > > Key: HIVE-11901 > URL: https://issues.apache.org/jira/browse/HIVE-11901 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 1.2.1 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: HIVE-11901.01.patch > > > With HIVE-7895, it will require write permission on the table directory even > for a SELECT statement. > Looking at the stacktrace, it seems the method > {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, > Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats > a null partition as a CREATE statement, which can also be a SELECT. > We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first > in order to tell which statement it is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11149) Fix issue with sometimes HashMap in PerfLogger.java hangs
[ https://issues.apache.org/jira/browse/HIVE-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948011#comment-14948011 ] Chengbing Liu commented on HIVE-11149: -- [~sershe], would you commit this? > Fix issue with sometimes HashMap in PerfLogger.java hangs > -- > > Key: HIVE-11149 > URL: https://issues.apache.org/jira/browse/HIVE-11149 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 1.2.1 >Reporter: WangMeng >Assignee: WangMeng > Attachments: HIVE-11149.01.patch, HIVE-11149.02.patch, > HIVE-11149.03.patch, HIVE-11149.04.patch > > > In Multi-thread environment, sometimes the HashMap in PerfLogger.java > will casue massive Java Processes hang and cost large amounts of > unnecessary CPU and Memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11149) Fix issue with sometimes HashMap in PerfLogger.java hangs
[ https://issues.apache.org/jira/browse/HIVE-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933184#comment-14933184 ] Chengbing Liu commented on HIVE-11149: -- [~sershe], could you please take a look at the latest patch? Thanks. > Fix issue with sometimes HashMap in PerfLogger.java hangs > -- > > Key: HIVE-11149 > URL: https://issues.apache.org/jira/browse/HIVE-11149 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 1.2.1 >Reporter: WangMeng >Assignee: WangMeng > Attachments: HIVE-11149.01.patch, HIVE-11149.02.patch, > HIVE-11149.03.patch, HIVE-11149.04.patch > > > In Multi-thread environment, sometimes the HashMap in PerfLogger.java > will casue massive Java Processes hang and cost large amounts of > unnecessary CPU and Memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements
[ https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933182#comment-14933182 ] Chengbing Liu commented on HIVE-11901: -- [~thejas], I find it difficult to add a test case for it from scratch. Do we need to mock {{Table}} and even {{Path}}? And we have to consider HDFS ACL for {{StorageBasedAuthorizationProvider}}... > StorageBasedAuthorizationProvider requires write permission on table for > SELECT statements > -- > > Key: HIVE-11901 > URL: https://issues.apache.org/jira/browse/HIVE-11901 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 1.2.1 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: HIVE-11901.01.patch > > > With HIVE-7895, it will require write permission on the table directory even > for a SELECT statement. > Looking at the stacktrace, it seems the method > {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, > Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats > a null partition as a CREATE statement, which can also be a SELECT. > We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first > in order to tell which statement it is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11149) Fix issue with sometimes HashMap in PerfLogger.java hangs
[ https://issues.apache.org/jira/browse/HIVE-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated HIVE-11149: - Attachment: HIVE-11149.04.patch Attached HIVE-11149.04.patch. Since multiple threads may share a session, we should not cache {{perfLogger}} as a member of {{SessionState}}. I also make the {{PerfLogger}} constructor private so that the only way to get an instance is by calling {{getPerfLogger}}. > Fix issue with sometimes HashMap in PerfLogger.java hangs > -- > > Key: HIVE-11149 > URL: https://issues.apache.org/jira/browse/HIVE-11149 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 1.2.1 >Reporter: WangMeng >Assignee: WangMeng > Attachments: HIVE-11149.01.patch, HIVE-11149.02.patch, > HIVE-11149.03.patch, HIVE-11149.04.patch > > > In Multi-thread environment, sometimes the HashMap in PerfLogger.java > will casue massive Java Processes hang and cost large amounts of > unnecessary CPU and Memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7261) Calculation works wrong when hive.groupby.skewindata is true and count(*) count(distinct) group by work simultaneously
[ https://issues.apache.org/jira/browse/HIVE-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589193#comment-14589193 ] Chengbing Liu commented on HIVE-7261: - HIVE-10971 solved the same problem, mark as a duplicate. Calculation works wrong when hive.groupby.skewindata is true and count(*) count(distinct) group by work simultaneously Key: HIVE-7261 URL: https://issues.apache.org/jira/browse/HIVE-7261 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Environment: hive0.12 hadoop1.0.4 Reporter: Chris Chen 【Phenomenon】 The query results are not the same as when hive.groupby.skewindata was setted to true and false. 【my question】 I want to calculate the count(*) and count(distinct) simultaneously ,otherwise it will cost 2 MR job to calculate. But when i set the hive.groupby.skewindata to be true, the count(*) result shoud not be same as the count(distinct) , but the real result is same, so it's wrong. And I find the difference of its query plan which the Reduce Operator Tree-Group By Operator-mode is mergepartial when skew is set to false and Reduce Operator Tree-Group By Operator-mode is complete when skew is set to true. So i'm confused the root cause of the error. 【sql】 select ds,appid,eventname,active,{color:red}count(distinct(guid)), count(*) {color}from eventinfo_tmp where ds='20140612' and length(eventname)1000 and eventname like '%alibaba%' group by ds,appid,eventname,active; 【the others hive configaration exclude hive.groupby.skewindata】 hive.exec.compress.output=true hive.exec.compress.intermediate=true io.seqfile.compression.type=BLOCK mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec hive.map.aggr=true hive.stats.autogather=false hive.exec.scratchdir=/user/complat/tmp mapred.job.queue.name=complat hive.exec.mode.local.auto=false hive.exec.mode.local.auto.inputbytes.max=500 hive.exec.mode.local.auto.tasks.max=10 hive.exec.mode.local.auto.input.files.max=1000 hive.exec.dynamic.partition=true hive.exec.dynamic.partition.mode=nonstrict hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat mapred.max.split.size=1 mapred.min.split.size.per.node=1 mapred.min.split.size.per.rack=1 【result】 when hive.groupby.skewindata=true the result is : 20140612 8 alibaba 1 {color:red}87 147{color} when it=false the result is : 20140612 8 alibaba 1 {color:red}87 87{color} 【query plan】 ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME eventinfo_tmp))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL ds)) (TOK_SELEXPR (TOK_TABLE_OR_COL appid)) (TOK_SELEXPR (TOK_TABLE_OR_COL eventname)) (TOK_SELEXPR (TOK_TABLE_OR_COL active)) (TOK_SELEXPR (TOK_FUNCTIONDI count (TOK_TABLE_OR_COL guid))) (TOK_SELEXPR (TOK_FUNCTIONSTAR count))) (TOK_WHERE (and (and (= (TOK_TABLE_OR_COL ds) '20140612') ( (TOK_FUNCTION length (TOK_TABLE_OR_COL eventname)) 1000)) (like (TOK_TABLE_OR_COL eventname) '%tvvideo_setting%'))) (TOK_GROUPBY (TOK_TABLE_OR_COL ds) (TOK_TABLE_OR_COL appid) (TOK_TABLE_OR_COL eventname) (TOK_TABLE_OR_COL active STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: eventinfo_tmp TableScan alias: eventinfo_tmp Filter Operator predicate: expr: ((length(eventname) 1000) and (eventname like '%tvvideo_setting%')) type: boolean Select Operator expressions: expr: ds type: string expr: appid type: string expr: eventname type: string expr: active type: int expr: guid type: string outputColumnNames: ds, appid, eventname, active, guid Group By Operator aggregations: expr: count(DISTINCT guid) expr: count() bucketGroup: false keys: expr: ds type: string expr: appid type: string expr: eventname type: string expr: active type: int expr: guid
[jira] [Resolved] (HIVE-7261) Calculation works wrong when hive.groupby.skewindata is true and count(*) count(distinct) group by work simultaneously
[ https://issues.apache.org/jira/browse/HIVE-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu resolved HIVE-7261. - Resolution: Duplicate Calculation works wrong when hive.groupby.skewindata is true and count(*) count(distinct) group by work simultaneously Key: HIVE-7261 URL: https://issues.apache.org/jira/browse/HIVE-7261 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Environment: hive0.12 hadoop1.0.4 Reporter: Chris Chen 【Phenomenon】 The query results are not the same as when hive.groupby.skewindata was setted to true and false. 【my question】 I want to calculate the count(*) and count(distinct) simultaneously ,otherwise it will cost 2 MR job to calculate. But when i set the hive.groupby.skewindata to be true, the count(*) result shoud not be same as the count(distinct) , but the real result is same, so it's wrong. And I find the difference of its query plan which the Reduce Operator Tree-Group By Operator-mode is mergepartial when skew is set to false and Reduce Operator Tree-Group By Operator-mode is complete when skew is set to true. So i'm confused the root cause of the error. 【sql】 select ds,appid,eventname,active,{color:red}count(distinct(guid)), count(*) {color}from eventinfo_tmp where ds='20140612' and length(eventname)1000 and eventname like '%alibaba%' group by ds,appid,eventname,active; 【the others hive configaration exclude hive.groupby.skewindata】 hive.exec.compress.output=true hive.exec.compress.intermediate=true io.seqfile.compression.type=BLOCK mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec hive.map.aggr=true hive.stats.autogather=false hive.exec.scratchdir=/user/complat/tmp mapred.job.queue.name=complat hive.exec.mode.local.auto=false hive.exec.mode.local.auto.inputbytes.max=500 hive.exec.mode.local.auto.tasks.max=10 hive.exec.mode.local.auto.input.files.max=1000 hive.exec.dynamic.partition=true hive.exec.dynamic.partition.mode=nonstrict hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat mapred.max.split.size=1 mapred.min.split.size.per.node=1 mapred.min.split.size.per.rack=1 【result】 when hive.groupby.skewindata=true the result is : 20140612 8 alibaba 1 {color:red}87 147{color} when it=false the result is : 20140612 8 alibaba 1 {color:red}87 87{color} 【query plan】 ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME eventinfo_tmp))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL ds)) (TOK_SELEXPR (TOK_TABLE_OR_COL appid)) (TOK_SELEXPR (TOK_TABLE_OR_COL eventname)) (TOK_SELEXPR (TOK_TABLE_OR_COL active)) (TOK_SELEXPR (TOK_FUNCTIONDI count (TOK_TABLE_OR_COL guid))) (TOK_SELEXPR (TOK_FUNCTIONSTAR count))) (TOK_WHERE (and (and (= (TOK_TABLE_OR_COL ds) '20140612') ( (TOK_FUNCTION length (TOK_TABLE_OR_COL eventname)) 1000)) (like (TOK_TABLE_OR_COL eventname) '%tvvideo_setting%'))) (TOK_GROUPBY (TOK_TABLE_OR_COL ds) (TOK_TABLE_OR_COL appid) (TOK_TABLE_OR_COL eventname) (TOK_TABLE_OR_COL active STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: eventinfo_tmp TableScan alias: eventinfo_tmp Filter Operator predicate: expr: ((length(eventname) 1000) and (eventname like '%tvvideo_setting%')) type: boolean Select Operator expressions: expr: ds type: string expr: appid type: string expr: eventname type: string expr: active type: int expr: guid type: string outputColumnNames: ds, appid, eventname, active, guid Group By Operator aggregations: expr: count(DISTINCT guid) expr: count() bucketGroup: false keys: expr: ds type: string expr: appid type: string expr: eventname type: string expr: active type: int expr: guid type: string mode: hash