[jira] [Updated] (HIVE-8186) Self join may fail if one side has VCs and other doesn't
[ https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8186: Attachment: HIVE-8186.2.patch.txt Self join may fail if one side has VCs and other doesn't Key: HIVE-8186 URL: https://issues.apache.org/jira/browse/HIVE-8186 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt See comments. This also fails on trunk, although not on original join_vc query -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8186) Self join may fail if one side has VCs and other doesn't
[ https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8186: Attachment: (was: HIVE-8186.2.patch.txt) Self join may fail if one side has VCs and other doesn't Key: HIVE-8186 URL: https://issues.apache.org/jira/browse/HIVE-8186 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt See comments. This also fails on trunk, although not on original join_vc query -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8186) Self join may fail if one side has VCs and other doesn't
[ https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151389#comment-14151389 ] Navis commented on HIVE-8186: - [~sershe] If you are busy, can I take this? Self join may fail if one side has VCs and other doesn't Key: HIVE-8186 URL: https://issues.apache.org/jira/browse/HIVE-8186 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt See comments. This also fails on trunk, although not on original join_vc query -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8283) Missing break in FilterSelectivityEstimator#visitCall()
[ https://issues.apache.org/jira/browse/HIVE-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8283: Attachment: HIVE-8283.1.patch.txt Missing break in FilterSelectivityEstimator#visitCall() --- Key: HIVE-8283 URL: https://issues.apache.org/jira/browse/HIVE-8283 Project: Hive Issue Type: Bug Reporter: Ted Yu Attachments: HIVE-8283.1.patch.txt {code} case NOT_EQUALS: { selectivity = computeNotEqualitySelectivity(call); } case LESS_THAN_OR_EQUAL: case GREATER_THAN_OR_EQUAL: case LESS_THAN: case GREATER_THAN: { selectivity = ((double) 1 / (double) 3); break; } {code} break is missing for NOT_EQUALS case. selectivity would be overwritten with 1/3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8283) Missing break in FilterSelectivityEstimator#visitCall()
[ https://issues.apache.org/jira/browse/HIVE-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-8283: Status: Patch Available (was: Open) Seemed need update of some query results. Missing break in FilterSelectivityEstimator#visitCall() --- Key: HIVE-8283 URL: https://issues.apache.org/jira/browse/HIVE-8283 Project: Hive Issue Type: Bug Reporter: Ted Yu Attachments: HIVE-8283.1.patch.txt {code} case NOT_EQUALS: { selectivity = computeNotEqualitySelectivity(call); } case LESS_THAN_OR_EQUAL: case GREATER_THAN_OR_EQUAL: case LESS_THAN: case GREATER_THAN: { selectivity = ((double) 1 / (double) 3); break; } {code} break is missing for NOT_EQUALS case. selectivity would be overwritten with 1/3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance
[ https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151395#comment-14151395 ] Hive QA commented on HIVE-8196: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671731/HIVE-8196.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6362 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1031/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1031/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1031/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12671731 Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance - Key: HIVE-8196 URL: https://issues.apache.org/jira/browse/HIVE-8196 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Blocker Labels: performance Fix For: 0.14.0 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch To make the best out of dynamic partition pruning joins should be on the partitioning columns which results in dynamically pruning the partitions from the fact table based on the qualifying column keys from the dimension table, this type of joins negatively effects on cardinality estimates with fetch column stats enabled. Currently we don't have statistics for partition columns and as a result NDV is set to row count, doing that negatively affects the estimated join selectivity from the join. Workaround is to capture statistics for partition columns or use number of partitions incase dynamic partitioning is used. In StatsUtils.getColStatisticsFromExpression is where count distincts gets set to row count {code} if (encd.getIsPartitionColOrVirtualCol()) { // vitual columns colType = encd.getTypeInfo().getTypeName(); countDistincts = numRows; oi = encd.getWritableObjectInspector(); {code} Query used to repro the issue : {code} set hive.stats.fetch.column.stats=true; set hive.tez.dynamic.partition.pruning=true; explain select d_date from store_sales, date_dim where store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 1998; {code} Plan {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 2 (BROADCAST_EDGE) DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_sold_date_sk is not null (type: boolean) Statistics: Num rows: 550076554 Data size: 47370018816 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_sold_date_sk} 1 {d_date_sk} {d_date} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col22, _col26, _col28 input vertices: 1 Map 2 Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col22 = _col26) (type: boolean) Statistics: Num rows: 326 Data size: 33252 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col28 (type: string) outputColumnNames: _col0 Statistics: Num rows: 326 Data size: 30644 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator
[jira] [Commented] (HIVE-8267) Exposing hbase cell latest timestamp through hbase columns mappings to hive columns.
[ https://issues.apache.org/jira/browse/HIVE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151403#comment-14151403 ] Navis commented on HIVE-8267: - [~ehsan] HIVE-2781 is applied long ago and HIVE-2828 is not broken (can have restricted feature). Could I ask the reason why you stated like that? Exposing hbase cell latest timestamp through hbase columns mappings to hive columns. Key: HIVE-8267 URL: https://issues.apache.org/jira/browse/HIVE-8267 Project: Hive Issue Type: New Feature Components: HBase Handler Affects Versions: 0.14.0 Reporter: Muhammad Ehsan ul Haque Priority: Minor Fix For: 0.14.0 Attachments: HIVE-8267.0.patch Previous attempts HIVE-2781 (not accepted), HIVE-2828 (broken and proposed with restricted feature). The feature is to have hbase cell latest timestamp accessible in hive query, by mapping the cell timestamp with a hive column, using mapping format like {code}:timestamp:cf:[optional qualifier or qualifier prefix]{code} The hive create table statement would be like h4. For mapping a cell latest timestamp. {code} CREATE TABLE hive_hbase_table (key STRING, col1 STRING, col1_ts BIGINT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:qualifier, :timestamp:cf:qualifier) TBLPROPERTIES (hbase.table.name = hbase_table); {code} h4. For mapping a column family latest timestamp. {code} CREATE TABLE hive_hbase_table (key STRING, valuemap MAPSTRING, STRING, timestampmap MAPSTRING, BIGINT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:,:timestamp:cf:) TBLPROPERTIES (hbase.table.name = hbase_table); {code} h4. Providing default cell value {code} CREATE TABLE hive_hbase_table(key int, value string, value_timestamp bigint) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = cf:qualifier, :timestamp:cf:qualifier, hbase.put.default.cell.value = default value) TBLPROPERTIES (hbase.table.name = hbase_table); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8288) HiveServer2 dynamic discovery should create znodes organized by version number
Vaibhav Gumashta created HIVE-8288: -- Summary: HiveServer2 dynamic discovery should create znodes organized by version number Key: HIVE-8288 URL: https://issues.apache.org/jira/browse/HIVE-8288 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Something like: /hiveserver2/version_no/znode_name would be better to support admin actions like removing all znodes for a particular version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8180) Update SparkReduceRecordHandler for processing the vectors [spark branch]
[ https://issues.apache.org/jira/browse/HIVE-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151412#comment-14151412 ] Chinna Rao Lalam commented on HIVE-8180: RB link : https://reviews.apache.org/r/26130/ Update SparkReduceRecordHandler for processing the vectors [spark branch] - Key: HIVE-8180 URL: https://issues.apache.org/jira/browse/HIVE-8180 Project: Hive Issue Type: Bug Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Labels: Spark-M1 Attachments: HIVE-8180-spark.patch, HIVE-8180.1-spark.patch, HIVE-8180.2-spark.patch Update SparkReduceRecordHandler for processing the vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8265) Build failure on hadoop-1
[ https://issues.apache.org/jira/browse/HIVE-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151413#comment-14151413 ] Navis commented on HIVE-8265: - Test fail seemed not related to this. Build failure on hadoop-1 -- Key: HIVE-8265 URL: https://issues.apache.org/jira/browse/HIVE-8265 Project: Hive Issue Type: Task Components: Tests Reporter: Navis Assignee: Navis Priority: Blocker Attachments: HIVE-8265.1.patch.txt no pre-commit-tests Fails from CustomPartitionVertex and TestHive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance
[ https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8196: - Attachment: HIVE-8196.4.patch Fixes parallel.q test. Rebased patch to latest trunk. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance - Key: HIVE-8196 URL: https://issues.apache.org/jira/browse/HIVE-8196 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Blocker Labels: performance Fix For: 0.14.0 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, HIVE-8196.4.patch To make the best out of dynamic partition pruning joins should be on the partitioning columns which results in dynamically pruning the partitions from the fact table based on the qualifying column keys from the dimension table, this type of joins negatively effects on cardinality estimates with fetch column stats enabled. Currently we don't have statistics for partition columns and as a result NDV is set to row count, doing that negatively affects the estimated join selectivity from the join. Workaround is to capture statistics for partition columns or use number of partitions incase dynamic partitioning is used. In StatsUtils.getColStatisticsFromExpression is where count distincts gets set to row count {code} if (encd.getIsPartitionColOrVirtualCol()) { // vitual columns colType = encd.getTypeInfo().getTypeName(); countDistincts = numRows; oi = encd.getWritableObjectInspector(); {code} Query used to repro the issue : {code} set hive.stats.fetch.column.stats=true; set hive.tez.dynamic.partition.pruning=true; explain select d_date from store_sales, date_dim where store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 1998; {code} Plan {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 2 (BROADCAST_EDGE) DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_sold_date_sk is not null (type: boolean) Statistics: Num rows: 550076554 Data size: 47370018816 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_sold_date_sk} 1 {d_date_sk} {d_date} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col22, _col26, _col28 input vertices: 1 Map 2 Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col22 = _col26) (type: boolean) Statistics: Num rows: 326 Data size: 33252 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col28 (type: string) outputColumnNames: _col0 Statistics: Num rows: 326 Data size: 30644 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 326 Data size: 30644 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: date_dim filterExpr: (d_date_sk is not null and (d_year = 1998)) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (d_date_sk is not null and (d_year = 1998)) (type: boolean)
[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151450#comment-14151450 ] Prasanth J commented on HIVE-8226: -- [~mmccline] Can you rebase the patch against current trunk? I see failure when I tried to commit this patch. There is diff in golden file when I ran dynpart_sort_opt_vectorization.q test. Also patch did not apply cleanly on trunk. Also is this going into branch-0.14 as well? If so please check with [~vikram.dixit] and make changes to Affects and Fix versions accordingly. Vectorize dynamic partitioning in VectorFileSinkOperator Key: HIVE-8226 URL: https://issues.apache.org/jira/browse/HIVE-8226 Project: Hive Issue Type: Bug Components: Tez, Vectorization Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message
[ https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-8287: Status: Patch Available (was: Open) StorageBasedAuth in metastore does not produce useful error message --- Key: HIVE-8287 URL: https://issues.apache.org/jira/browse/HIVE-8287 Project: Hive Issue Type: Bug Components: Authorization, Logging Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-8287.1.patch Example of error message that doesn't given enough useful information - {noformat} 0: jdbc:hive2://localhost:1 alter table parttab1 drop partition (p1='def'); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check logs. (state=08S01,code=1) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message
[ https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-8287: Attachment: HIVE-8287.1.patch StorageBasedAuth in metastore does not produce useful error message --- Key: HIVE-8287 URL: https://issues.apache.org/jira/browse/HIVE-8287 Project: Hive Issue Type: Bug Components: Authorization, Logging Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-8287.1.patch Example of error message that doesn't given enough useful information - {noformat} 0: jdbc:hive2://localhost:1 alter table parttab1 drop partition (p1='def'); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check logs. (state=08S01,code=1) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8226: --- Status: In Progress (was: Patch Available) Vectorize dynamic partitioning in VectorFileSinkOperator Key: HIVE-8226 URL: https://issues.apache.org/jira/browse/HIVE-8226 Project: Hive Issue Type: Bug Components: Tez, Vectorization Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8226: --- Attachment: HIVE-8226.03.patch Vectorize dynamic partitioning in VectorFileSinkOperator Key: HIVE-8226 URL: https://issues.apache.org/jira/browse/HIVE-8226 Project: Hive Issue Type: Bug Components: Tez, Vectorization Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, HIVE-8226.03.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8226: --- Status: Patch Available (was: In Progress) Vectorize dynamic partitioning in VectorFileSinkOperator Key: HIVE-8226 URL: https://issues.apache.org/jira/browse/HIVE-8226 Project: Hive Issue Type: Bug Components: Tez, Vectorization Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, HIVE-8226.03.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151468#comment-14151468 ] Matt McCline commented on HIVE-8226: Yes, I rebased and re-ran the dynpart_sort_opt_vectorization.q and found a few stages now vectorize... Perhaps I didn't create patch #2 correctly. Anyway, submitted patch #3. Vectorize dynamic partitioning in VectorFileSinkOperator Key: HIVE-8226 URL: https://issues.apache.org/jira/browse/HIVE-8226 Project: Hive Issue Type: Bug Components: Tez, Vectorization Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, HIVE-8226.03.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151469#comment-14151469 ] Hive QA commented on HIVE-7723: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671737/HIVE-7723.8.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6364 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_escape1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_escape2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1032/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1032/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1032/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12671737 Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity Key: HIVE-7723 URL: https://issues.apache.org/jira/browse/HIVE-7723 Project: Hive Issue Type: Bug Components: CLI, Physical Optimizer Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-7723.1.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it showed that ReadEntity.equals is taking ~40% of the CPU. ReadEntity.equals is called from the snippet below. Again and again the set is iterated over to get the actual match, a HashMap is a better option for this case as Set doesn't have a Get method. Also for ReadEntity equals is case-insensitive while hash is , which is an undesired behavior. {code} public static ReadEntity addInput(SetReadEntity inputs, ReadEntity newInput) { // If the input is already present, make sure the new parent is added to the input. if (inputs.contains(newInput)) { for (ReadEntity input : inputs) { if (input.equals(newInput)) { if ((newInput.getParents() != null) (!newInput.getParents().isEmpty())) { input.getParents().addAll(newInput.getParents()); input.setDirect(input.isDirect() || newInput.isDirect()); } return input; } } assert false; } else { inputs.add(newInput); return newInput; } // make compile happy return null; } {code} This is the query used : {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN
[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8226: --- Affects Version/s: 0.14.0 Vectorize dynamic partitioning in VectorFileSinkOperator Key: HIVE-8226 URL: https://issues.apache.org/jira/browse/HIVE-8226 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, HIVE-8226.03.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-8226: --- Fix Version/s: 0.14.0 Vectorize dynamic partitioning in VectorFileSinkOperator Key: HIVE-8226 URL: https://issues.apache.org/jira/browse/HIVE-8226 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, HIVE-8226.03.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151472#comment-14151472 ] Matt McCline commented on HIVE-8226: ~pjayachandran I added you to e-mail I sent to Gunther about branch-0.14 Vectorize dynamic partitioning in VectorFileSinkOperator Key: HIVE-8226 URL: https://issues.apache.org/jira/browse/HIVE-8226 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, HIVE-8226.03.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message
[ https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-8287: Attachment: HIVE-8287.2.patch StorageBasedAuth in metastore does not produce useful error message --- Key: HIVE-8287 URL: https://issues.apache.org/jira/browse/HIVE-8287 Project: Hive Issue Type: Bug Components: Authorization, Logging Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-8287.1.patch, HIVE-8287.2.patch Example of error message that doesn't given enough useful information - {noformat} 0: jdbc:hive2://localhost:1 alter table parttab1 drop partition (p1='def'); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check logs. (state=08S01,code=1) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message
[ https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151488#comment-14151488 ] Thejas M Nair commented on HIVE-8287: - HIVE-8287.2.patch - also includes changes to webhcat e2e tests for new error messages, and for changes in HIVE-8221 . StorageBasedAuth in metastore does not produce useful error message --- Key: HIVE-8287 URL: https://issues.apache.org/jira/browse/HIVE-8287 Project: Hive Issue Type: Bug Components: Authorization, Logging Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-8287.1.patch, HIVE-8287.2.patch Example of error message that doesn't given enough useful information - {noformat} 0: jdbc:hive2://localhost:1 alter table parttab1 drop partition (p1='def'); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check logs. (state=08S01,code=1) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7685) Parquet memory manager
[ https://issues.apache.org/jira/browse/HIVE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151496#comment-14151496 ] Dong Chen commented on HIVE-7685: - Hi Brock, I think a brief design for this memory manager is: Every new writer registers itself to the manager. The manager has an overall view of all the writers. When a condition is up (such as every 1000 rows), it will notify the writers to check memory usage and flush if necessary. However, a problem for Parquet specifically is: Hive only has a wrapper for the ParquetRecordWriter, and even ParquetRecordWriter also wrap the real writer (InternalParquetRecordWriter) in Parquet project. Since the behaviors of measuring dynamic buffer size and flushing are private in the real writer, I think we also have to add code in InternalParquetRecordWriter to implement the memory manager functionality. It seems only changing Hive code cannot fix this Jira. Not sure whether we should put this problem in Parquet project and fix it there, if it is generic enough and not Hive specific? Any other ideas? Best Regards, Dong Parquet memory manager -- Key: HIVE-7685 URL: https://issues.apache.org/jira/browse/HIVE-7685 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Brock Noland Similar to HIVE-4248, Parquet tries to write large very large row groups. This causes Hive to run out of memory during dynamic partitions when a reducer may have many Parquet files open at a given time. As such, we should implement a memory manager which ensures that we don't run out of memory due to writing too many row groups within a single JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8222) CBO Trunk Merge: Fix Check Style issues
[ https://issues.apache.org/jira/browse/HIVE-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151498#comment-14151498 ] Lars Francke commented on HIVE-8222: Would anyone mind taking a look? Shall I open a review? This one will probably go stale very fast so I'd appreciate a quick turnaround to avoid a lot of extra work. CBO Trunk Merge: Fix Check Style issues --- Key: HIVE-8222 URL: https://issues.apache.org/jira/browse/HIVE-8222 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-8222.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7776: Attachment: HIVE-7776.2-spark.patch add several MR configuration legacy to enable hive features which based on mapred.task.id/mapreduce.task.attempt.id/mapred.task.partition. enable sample10.q.[Spark Branch] Key: HIVE-7776 URL: https://issues.apache.org/jira/browse/HIVE-7776 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch sample10.q contain dynamic partition operation, should enable this qtest after hive on spark support dynamic partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25495: HIVE-7776, enable sample10.q
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25495/ --- (Updated 九月 29, 2014, 9:10 a.m.) Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-7776 https://issues.apache.org/jira/browse/HIVE-7776 Repository: hive-git Description (updated) --- Hive get task Id through 2 ways in Utilities::getTaskId: get parameter value of mapred.task.id from configuration. generate random value while #1 return null. set mapred.task.id on executor side as we can build it through TaskContext now. Diffs - itests/src/test/resources/testconfiguration.properties 155abad ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 3ff0782 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 02f9d99 ql/src/test/results/clientpositive/spark/sample10.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25495/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 25495: HIVE-7776, enable sample10.q
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25495/ --- (Updated 九月 29, 2014, 9:11 a.m.) Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-7776 https://issues.apache.org/jira/browse/HIVE-7776 Repository: hive-git Description --- Hive get task Id through 2 ways in Utilities::getTaskId: get parameter value of mapred.task.id from configuration. generate random value while #1 return null. set mapred.task.id on executor side as we can build it through TaskContext now. Diffs (updated) - itests/src/test/resources/testconfiguration.properties 89243fc ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 1674d4b ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HivePairFlatMapFunction.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 0b8b7c9 ql/src/test/results/clientpositive/spark/sample10.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25495/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 25495: HIVE-7776, enable sample10.q
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25495/ --- (Updated 九月 29, 2014, 9:13 a.m.) Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-7776 https://issues.apache.org/jira/browse/HIVE-7776 Repository: hive-git Description --- Hive get task Id through 2 ways in Utilities::getTaskId: get parameter value of mapred.task.id from configuration. generate random value while #1 return null. set mapred.task.id on executor side as we can build it through TaskContext now. Diffs (updated) - itests/src/test/resources/testconfiguration.properties 89243fc ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 1674d4b ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HivePairFlatMapFunction.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 0b8b7c9 ql/src/test/results/clientpositive/spark/sample10.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25495/diff/ Testing --- Thanks, chengxiang li
[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7776: Attachment: HIVE-7776.3-spark.patch enable sample10.q.[Spark Branch] Key: HIVE-7776 URL: https://issues.apache.org/jira/browse/HIVE-7776 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, HIVE-7776.3-spark.patch sample10.q contain dynamic partition operation, should enable this qtest after hive on spark support dynamic partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7776: Status: Patch Available (was: Open) enable sample10.q.[Spark Branch] Key: HIVE-7776 URL: https://issues.apache.org/jira/browse/HIVE-7776 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, HIVE-7776.3-spark.patch sample10.q contain dynamic partition operation, should enable this qtest after hive on spark support dynamic partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7776) enable sample10.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151533#comment-14151533 ] Hive QA commented on HIVE-7776: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671768/HIVE-7776.3-spark.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/171/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/171/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-171/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-SPARK-Build-171/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-spark-source ]] + [[ ! -d apache-svn-spark-source/.svn ]] + [[ ! -d apache-svn-spark-source ]] + cd apache-svn-spark-source + svn revert -R . Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java' ++ svn status --no-ignore ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' + rm -rf target datanucleus.log ant/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target metastore/target common/target common/src/gen serde/target ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1628143. At revision 1628143. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12671768 enable sample10.q.[Spark Branch] Key: HIVE-7776 URL: https://issues.apache.org/jira/browse/HIVE-7776 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, HIVE-7776.3-spark.patch sample10.q contain dynamic partition operation, should enable this qtest after hive on spark support dynamic partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7776) enable sample10.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151539#comment-14151539 ] Chengxiang Li commented on HIVE-7776: - This patch depends on HIVE-7627, I should re-upload it after HIVE-7627 has been committed. enable sample10.q.[Spark Branch] Key: HIVE-7776 URL: https://issues.apache.org/jira/browse/HIVE-7776 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, HIVE-7776.3-spark.patch sample10.q contain dynamic partition operation, should enable this qtest after hive on spark support dynamic partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2573) Create per-session function registry
[ https://issues.apache.org/jira/browse/HIVE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151555#comment-14151555 ] Hive QA commented on HIVE-2573: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671743/HIVE-2573.4.patch.txt {color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 6365 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_func1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_functions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_collect_set org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_corr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_pop org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_samp org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_avg org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_count org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_min org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_percentile org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_std org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_stddev org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_stddev_samp org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sum org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_var_pop org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_var_samp org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_variance org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_max org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_min org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.testNegativeCliDriver_invalid_row_sequence org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.service.TestHiveServerSessions.testSessionFuncs org.apache.hive.jdbc.TestJdbcDriver2.testGetQueryLog {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1033/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1033/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1033/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 24 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12671743 Create per-session function registry - Key: HIVE-2573 URL: https://issues.apache.org/jira/browse/HIVE-2573 Project: Hive Issue Type: Improvement Components: Server Infrastructure Reporter: Navis Assignee: Navis Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2573.D3231.1.patch, HIVE-2573.1.patch.txt, HIVE-2573.2.patch.txt, HIVE-2573.3.patch.txt, HIVE-2573.4.patch.txt Currently the function registry is shared resource and could be overrided by other users when using HiveServer. If per-session function registry is provided, this situation could be prevented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8186) Self join may fail if one side has VCs and other doesn't
[ https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151597#comment-14151597 ] Hive QA commented on HIVE-8186: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671748/HIVE-8186.2.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6362 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1034/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1034/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1034/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12671748 Self join may fail if one side has VCs and other doesn't Key: HIVE-8186 URL: https://issues.apache.org/jira/browse/HIVE-8186 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt See comments. This also fails on trunk, although not on original join_vc query -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8267) Exposing hbase cell latest timestamp through hbase columns mappings to hive columns.
[ https://issues.apache.org/jira/browse/HIVE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151623#comment-14151623 ] Muhammad Ehsan ul Haque commented on HIVE-8267: --- My bad I just copy pasted from HIVE-2828 description (Originated from HIVE-2781 and not accepted, but I think this could be helpful to someone). However there is one more HIVE-2306 still open, no patch available. Hive-2828, has failing test after 2.5 year rebase. Also exposes timestamp by picking the timestamp of the first cell only. {code} long timestamp = result.rawCells()[0].getTimestamp(); {code} Does not allow to expose timestamp of all or particular cells in some column families. Exposing hbase cell latest timestamp through hbase columns mappings to hive columns. Key: HIVE-8267 URL: https://issues.apache.org/jira/browse/HIVE-8267 Project: Hive Issue Type: New Feature Components: HBase Handler Affects Versions: 0.14.0 Reporter: Muhammad Ehsan ul Haque Priority: Minor Fix For: 0.14.0 Attachments: HIVE-8267.0.patch Previous attempts HIVE-2781 (not accepted), HIVE-2828 (broken and proposed with restricted feature). The feature is to have hbase cell latest timestamp accessible in hive query, by mapping the cell timestamp with a hive column, using mapping format like {code}:timestamp:cf:[optional qualifier or qualifier prefix]{code} The hive create table statement would be like h4. For mapping a cell latest timestamp. {code} CREATE TABLE hive_hbase_table (key STRING, col1 STRING, col1_ts BIGINT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:qualifier, :timestamp:cf:qualifier) TBLPROPERTIES (hbase.table.name = hbase_table); {code} h4. For mapping a column family latest timestamp. {code} CREATE TABLE hive_hbase_table (key STRING, valuemap MAPSTRING, STRING, timestampmap MAPSTRING, BIGINT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:,:timestamp:cf:) TBLPROPERTIES (hbase.table.name = hbase_table); {code} h4. Providing default cell value {code} CREATE TABLE hive_hbase_table(key int, value string, value_timestamp bigint) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = cf:qualifier, :timestamp:cf:qualifier, hbase.put.default.cell.value = default value) TBLPROPERTIES (hbase.table.name = hbase_table); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8283) Missing break in FilterSelectivityEstimator#visitCall()
[ https://issues.apache.org/jira/browse/HIVE-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151657#comment-14151657 ] Hive QA commented on HIVE-8283: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671749/HIVE-8283.1.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6364 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1035/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1035/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1035/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12671749 Missing break in FilterSelectivityEstimator#visitCall() --- Key: HIVE-8283 URL: https://issues.apache.org/jira/browse/HIVE-8283 Project: Hive Issue Type: Bug Reporter: Ted Yu Attachments: HIVE-8283.1.patch.txt {code} case NOT_EQUALS: { selectivity = computeNotEqualitySelectivity(call); } case LESS_THAN_OR_EQUAL: case GREATER_THAN_OR_EQUAL: case LESS_THAN: case GREATER_THAN: { selectivity = ((double) 1 / (double) 3); break; } {code} break is missing for NOT_EQUALS case. selectivity would be overwritten with 1/3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151701#comment-14151701 ] Yongzhi Chen commented on HIVE-8182: Is trimming very line a good idea? I think to be consistent with single line case, maybe only trim last line is better choice. My suggestion is add line = line.trim(); before if (line.endsWith(;)) { line = line.substring(0, line.length() - 1); } beeline fails when executing multiple-line queries with trailing spaces --- Key: HIVE-8182 URL: https://issues.apache.org/jira/browse/HIVE-8182 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Sergio Peña Fix For: 0.14.0 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch As title indicates, when executing a multi-line query with trailing spaces, beeline reports syntax error: Error: Error while compiling statement: FAILED: ParseException line 1:76 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4) If put this query in one single line, beeline succeeds to execute it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance
[ https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151736#comment-14151736 ] Hive QA commented on HIVE-8196: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671753/HIVE-8196.4.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6364 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1036/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1036/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1036/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12671753 Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance - Key: HIVE-8196 URL: https://issues.apache.org/jira/browse/HIVE-8196 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Blocker Labels: performance Fix For: 0.14.0 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, HIVE-8196.4.patch To make the best out of dynamic partition pruning joins should be on the partitioning columns which results in dynamically pruning the partitions from the fact table based on the qualifying column keys from the dimension table, this type of joins negatively effects on cardinality estimates with fetch column stats enabled. Currently we don't have statistics for partition columns and as a result NDV is set to row count, doing that negatively affects the estimated join selectivity from the join. Workaround is to capture statistics for partition columns or use number of partitions incase dynamic partitioning is used. In StatsUtils.getColStatisticsFromExpression is where count distincts gets set to row count {code} if (encd.getIsPartitionColOrVirtualCol()) { // vitual columns colType = encd.getTypeInfo().getTypeName(); countDistincts = numRows; oi = encd.getWritableObjectInspector(); {code} Query used to repro the issue : {code} set hive.stats.fetch.column.stats=true; set hive.tez.dynamic.partition.pruning=true; explain select d_date from store_sales, date_dim where store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 1998; {code} Plan {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 2 (BROADCAST_EDGE) DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_sold_date_sk is not null (type: boolean) Statistics: Num rows: 550076554 Data size: 47370018816 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_sold_date_sk} 1 {d_date_sk} {d_date} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col22, _col26, _col28 input vertices: 1 Map 2 Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col22 = _col26) (type: boolean) Statistics: Num rows: 326 Data size: 33252 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col28 (type: string) outputColumnNames: _col0 Statistics: Num rows: 326 Data size: 30644 Basic stats: COMPLETE Column stats: COMPLETE
[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID
[ https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151757#comment-14151757 ] Damien Carol commented on HIVE-8231: [~alangates] I made lot of tests in WE. It seems that INSERT/DELETE/UPDATE doesn't work at all with concurrency enabled. If I deactivate ACID with : {noformat} !-- concurrency -- property namehive.support.concurrency/name valuefalse/value /property !-- compactor org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager / org.apache.hadoop.hive.ql.lockmgr.DbTxnManager -- property namehive.txn.manager/name valueorg.apache.hadoop.hive.ql.lockmgr.DummyTxnManager/value /property {noformat} then everything is ok. Error when insert into empty table with ACID Key: HIVE-8231 URL: https://issues.apache.org/jira/browse/HIVE-8231 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Fix For: 0.14.0 Steps to show the bug : 1. create table {code} create table encaissement_1b_64m like encaissement_1b; {code} 2. check table {code} desc encaissement_1b_64m; dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m; {code} everything is ok: {noformat} 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m; +++--+--+ | col_name | data_type | comment | +++--+--+ | id | int| | | idmagasin | int| | | zibzin | string | | | cheque | int| | | montant| double | | | date | timestamp | | | col_6 | string | | | col_7 | string | | | col_8 | string | | +++--+--+ 9 rows selected (0.158 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ +-+--+ No rows selected (0.01 seconds) {noformat} 3. Insert values into the new table {noformat} insert into table encaissement_1b_64m VALUES (1, 1, '8909', 1, 12.5, '12/05/2014', '','',''); {noformat} 4. Check {noformat} 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m; +-+--+ | id | +-+--+ +-+--+ No rows selected (0.091 seconds) {noformat} There are already a pb. I don't see the inserted row. 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder {noformat} 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:17 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421 | +-+--+ 2 rows selected (0.014 seconds) {noformat} 6. Doing a major compaction solves the bug {noformat} 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 'major'; No rows affected (0.046 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; ++--+ | DFS Output | ++--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:21
[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID
[ https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151758#comment-14151758 ] Damien Carol commented on HIVE-8231: To be more precise, this commands works : {code} drop table if exists foo6; create table foo6 (id int) clustered by (id) into 1 buckets; insert into table foo6 VALUES(1); select * from foo6; drop table if exists foo7; create table foo7 (id int) STORED AS ORC; insert into table foo7 VALUES(1); select * from foo7; {code} Error when insert into empty table with ACID Key: HIVE-8231 URL: https://issues.apache.org/jira/browse/HIVE-8231 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Fix For: 0.14.0 Steps to show the bug : 1. create table {code} create table encaissement_1b_64m like encaissement_1b; {code} 2. check table {code} desc encaissement_1b_64m; dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m; {code} everything is ok: {noformat} 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m; +++--+--+ | col_name | data_type | comment | +++--+--+ | id | int| | | idmagasin | int| | | zibzin | string | | | cheque | int| | | montant| double | | | date | timestamp | | | col_6 | string | | | col_7 | string | | | col_8 | string | | +++--+--+ 9 rows selected (0.158 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ +-+--+ No rows selected (0.01 seconds) {noformat} 3. Insert values into the new table {noformat} insert into table encaissement_1b_64m VALUES (1, 1, '8909', 1, 12.5, '12/05/2014', '','',''); {noformat} 4. Check {noformat} 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m; +-+--+ | id | +-+--+ +-+--+ No rows selected (0.091 seconds) {noformat} There are already a pb. I don't see the inserted row. 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder {noformat} 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:17 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421 | +-+--+ 2 rows selected (0.014 seconds) {noformat} 6. Doing a major compaction solves the bug {noformat} 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 'major'; No rows affected (0.046 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; ++--+ | DFS Output | ++--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:21 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421 | ++--+ 2 rows selected (0.02
[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID
[ https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151759#comment-14151759 ] Damien Carol commented on HIVE-8231: This bug is still here even with HIVE-8203 committed. Error when insert into empty table with ACID Key: HIVE-8231 URL: https://issues.apache.org/jira/browse/HIVE-8231 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Fix For: 0.14.0 Steps to show the bug : 1. create table {code} create table encaissement_1b_64m like encaissement_1b; {code} 2. check table {code} desc encaissement_1b_64m; dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m; {code} everything is ok: {noformat} 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m; +++--+--+ | col_name | data_type | comment | +++--+--+ | id | int| | | idmagasin | int| | | zibzin | string | | | cheque | int| | | montant| double | | | date | timestamp | | | col_6 | string | | | col_7 | string | | | col_8 | string | | +++--+--+ 9 rows selected (0.158 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ +-+--+ No rows selected (0.01 seconds) {noformat} 3. Insert values into the new table {noformat} insert into table encaissement_1b_64m VALUES (1, 1, '8909', 1, 12.5, '12/05/2014', '','',''); {noformat} 4. Check {noformat} 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m; +-+--+ | id | +-+--+ +-+--+ No rows selected (0.091 seconds) {noformat} There are already a pb. I don't see the inserted row. 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder {noformat} 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:17 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421 | +-+--+ 2 rows selected (0.014 seconds) {noformat} 6. Doing a major compaction solves the bug {noformat} 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 'major'; No rows affected (0.046 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; ++--+ | DFS Output | ++--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:21 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421 | ++--+ 2 rows selected (0.02 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8289) Exclude temp tables in compactor threads
Damien Carol created HIVE-8289: -- Summary: Exclude temp tables in compactor threads Key: HIVE-8289 URL: https://issues.apache.org/jira/browse/HIVE-8289 Project: Hive Issue Type: Improvement Reporter: Damien Carol Priority: Minor Currently, compactor thread try to compact temp table. This throws errors like this one : {noformat} 2014-09-26 15:32:18,483 ERROR [Thread-8]: compactor.Initiator (Initiator.java:run(111)) - Caught exception while trying to determine if we should compact testsimon.values__tmp__table__11. Marking clean to avoid repeated failures, java.lang.NullPointerException at org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:88) 2014-09-26 15:32:18,484 ERROR [Thread-8]: txn.CompactionTxnHandler (CompactionTxnHandler.java:markCleaned(355)) - Unable to delete compaction record {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7685) Parquet memory manager
[ https://issues.apache.org/jira/browse/HIVE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151792#comment-14151792 ] Brock Noland commented on HIVE-7685: Hi Dong, Ok, thank you for the investigation. I think we can either put the parquet memory manager in Parquet or add API's to expose the information required to implement the memory manager in HIve. Either approach is fine by me, we can take this work up in PARQUET-108. Brock Parquet memory manager -- Key: HIVE-7685 URL: https://issues.apache.org/jira/browse/HIVE-7685 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Brock Noland Similar to HIVE-4248, Parquet tries to write large very large row groups. This causes Hive to run out of memory during dynamic partitions when a reducer may have many Parquet files open at a given time. As such, we should implement a memory manager which ensures that we don't run out of memory due to writing too many row groups within a single JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8182: -- Status: Patch Available (was: Open) Trim the line once instead of doing it on every line. beeline fails when executing multiple-line queries with trailing spaces --- Key: HIVE-8182 URL: https://issues.apache.org/jira/browse/HIVE-8182 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.12.0 Reporter: Yongzhi Chen Assignee: Sergio Peña Fix For: 0.14.0 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch As title indicates, when executing a multi-line query with trailing spaces, beeline reports syntax error: Error: Error while compiling statement: FAILED: ParseException line 1:76 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4) If put this query in one single line, beeline succeeds to execute it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-7685) Parquet memory manager
[ https://issues.apache.org/jira/browse/HIVE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151792#comment-14151792 ] Brock Noland edited comment on HIVE-7685 at 9/29/14 3:31 PM: - Hi Dong, Ok, thank you for the investigation. I think we can either put the parquet memory manager in Parquet or add API's to expose the information required to implement the memory manager in Hive. Either approach is fine by me, we can take this work up in PARQUET-108. Brock was (Author: brocknoland): Hi Dong, Ok, thank you for the investigation. I think we can either put the parquet memory manager in Parquet or add API's to expose the information required to implement the memory manager in HIve. Either approach is fine by me, we can take this work up in PARQUET-108. Brock Parquet memory manager -- Key: HIVE-7685 URL: https://issues.apache.org/jira/browse/HIVE-7685 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Brock Noland Similar to HIVE-4248, Parquet tries to write large very large row groups. This causes Hive to run out of memory during dynamic partitions when a reducer may have many Parquet files open at a given time. As such, we should implement a memory manager which ensures that we don't run out of memory due to writing too many row groups within a single JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8182: -- Attachment: HIVE-8182.2.patch beeline fails when executing multiple-line queries with trailing spaces --- Key: HIVE-8182 URL: https://issues.apache.org/jira/browse/HIVE-8182 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Sergio Peña Fix For: 0.14.0 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch As title indicates, when executing a multi-line query with trailing spaces, beeline reports syntax error: Error: Error while compiling statement: FAILED: ParseException line 1:76 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4) If put this query in one single line, beeline succeeds to execute it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8182: -- Status: Open (was: Patch Available) beeline fails when executing multiple-line queries with trailing spaces --- Key: HIVE-8182 URL: https://issues.apache.org/jira/browse/HIVE-8182 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.12.0 Reporter: Yongzhi Chen Assignee: Sergio Peña Fix For: 0.14.0 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch As title indicates, when executing a multi-line query with trailing spaces, beeline reports syntax error: Error: Error while compiling statement: FAILED: ParseException line 1:76 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4) If put this query in one single line, beeline succeeds to execute it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151789#comment-14151789 ] Sergio Peña commented on HIVE-8182: --- Thanks [~ychena] I agree with having only one trimming instead of doing it every line. We can reduce extra work on Hive by using your suggestion. I did the test and it worked. I'll upload another patch. beeline fails when executing multiple-line queries with trailing spaces --- Key: HIVE-8182 URL: https://issues.apache.org/jira/browse/HIVE-8182 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.1 Reporter: Yongzhi Chen Assignee: Sergio Peña Fix For: 0.14.0 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch As title indicates, when executing a multi-line query with trailing spaces, beeline reports syntax error: Error: Error while compiling statement: FAILED: ParseException line 1:76 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4) If put this query in one single line, beeline succeeds to execute it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message
[ https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151844#comment-14151844 ] Hive QA commented on HIVE-8287: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671758/HIVE-8287.2.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6364 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_partition_with_whitelist org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_rename_partition_failure2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_disallow_incompatible_type_change_on1 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_disallow_incompatible_type_change_on2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_temp_table_rename {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1037/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1037/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1037/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12671758 StorageBasedAuth in metastore does not produce useful error message --- Key: HIVE-8287 URL: https://issues.apache.org/jira/browse/HIVE-8287 Project: Hive Issue Type: Bug Components: Authorization, Logging Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-8287.1.patch, HIVE-8287.2.patch Example of error message that doesn't given enough useful information - {noformat} 0: jdbc:hive2://localhost:1 alter table parttab1 drop partition (p1='def'); Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check logs. (state=08S01,code=1) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6148) Support arbitrary structs stored in HBase
[ https://issues.apache.org/jira/browse/HIVE-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151846#comment-14151846 ] Swarnim Kulkarni commented on HIVE-6148: The failed test seems flaky and unrelated to my changes here. Support arbitrary structs stored in HBase - Key: HIVE-6148 URL: https://issues.apache.org/jira/browse/HIVE-6148 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.12.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-6148.1.patch.txt, HIVE-6148.2.patch.txt, HIVE-6148.3.patch.txt We should add support to be able to query arbitrary structs stored in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7627: --- Attachment: HIVE-7627.5-spark.patch Re-uploading the same patch to test the precommit infra. FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch] - Key: HIVE-7627 URL: https://issues.apache.org/jira/browse/HIVE-7627 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: spark-m1 Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch, HIVE-7627.3-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.5-spark.patch, HIVE-7627.5-spark.patch Hive table statistic failed on FSStatsPublisher mode, with the following exception in Spark executor side: {noformat} 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at
[jira] [Commented] (HIVE-8245) Collect table read entities at same time as view read entities
[ https://issues.apache.org/jira/browse/HIVE-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151883#comment-14151883 ] Ashutosh Chauhan commented on HIVE-8245: Committed to 0.14 branch. Collect table read entities at same time as view read entities --- Key: HIVE-8245 URL: https://issues.apache.org/jira/browse/HIVE-8245 Project: Hive Issue Type: Improvement Components: CBO, Security Affects Versions: 0.13.0, 0.14.0, 0.13.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8245.1.patch, HIVE-8245.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8245) Collect table read entities at same time as view read entities
[ https://issues.apache.org/jira/browse/HIVE-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8245: --- Fix Version/s: (was: 0.15.0) 0.14.0 Collect table read entities at same time as view read entities --- Key: HIVE-8245 URL: https://issues.apache.org/jira/browse/HIVE-8245 Project: Hive Issue Type: Improvement Components: CBO, Security Affects Versions: 0.13.0, 0.14.0, 0.13.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8245.1.patch, HIVE-8245.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8191) Update and delete on tables with non Acid output formats gives runtime error
[ https://issues.apache.org/jira/browse/HIVE-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8191: - Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Patch 3 checked in. Thanks Eugene for the review. Update and delete on tables with non Acid output formats gives runtime error Key: HIVE-8191 URL: https://issues.apache.org/jira/browse/HIVE-8191 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8191.2.patch, HIVE-8191.3.patch, HIVE-8191.patch {code} create table not_an_acid_table(a int, b varchar(128)); insert into table not_an_acid_table select cint, cast(cstring1 as varchar(128)) from alltypesorc where cint is not null order by cint limit 10; delete from not_an_acid_table where b = '0ruyd6Y50JpdGRf6HqD'; {code} This generates a runtime error. It should get a compile error instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional
Alan Gates created HIVE-8290: Summary: With DbTxnManager configured, all ORC tables forced to be transactional Key: HIVE-8290 URL: https://issues.apache.org/jira/browse/HIVE-8290 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Currently, once a user configures DbTxnManager to the be transaction manager, all tables that use ORC are expected to be transactional. This means they all have to have buckets. This most likely won't be what users want. We need to add a specific mark to a table so that users can indicate it should be treated in a transactional way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional
[ https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151896#comment-14151896 ] Alan Gates commented on HIVE-8290: -- [~vikram.dixit] I'd like to get this into 0.14 as I believe not having it is a big usability issue, and it will be a backwards incompatible change if we add it later. With DbTxnManager configured, all ORC tables forced to be transactional --- Key: HIVE-8290 URL: https://issues.apache.org/jira/browse/HIVE-8290 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Currently, once a user configures DbTxnManager to the be transaction manager, all tables that use ORC are expected to be transactional. This means they all have to have buckets. This most likely won't be what users want. We need to add a specific mark to a table so that users can indicate it should be treated in a transactional way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151918#comment-14151918 ] Hive QA commented on HIVE-7627: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671815/HIVE-7627.5-spark.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6509 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/173/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/173/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-173/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12671815 FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch] - Key: HIVE-7627 URL: https://issues.apache.org/jira/browse/HIVE-7627 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: spark-m1 Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch, HIVE-7627.3-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.5-spark.patch, HIVE-7627.5-spark.patch Hive table statistic failed on FSStatsPublisher mode, with the following exception in Spark executor side: {noformat} 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID mismatch. Request id and saved id: 20277 , 20278 for file
[jira] [Updated] (HIVE-8114) Type resolution for udf arguments of Decimal Type results in error
[ https://issues.apache.org/jira/browse/HIVE-8114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8114: --- Fix Version/s: (was: 0.15.0) Type resolution for udf arguments of Decimal Type results in error -- Key: HIVE-8114 URL: https://issues.apache.org/jira/browse/HIVE-8114 Project: Hive Issue Type: Bug Components: Query Processor, Types Affects Versions: 0.13.0, 0.13.1 Reporter: Ashutosh Chauhan Assignee: Jason Dere Fix For: 0.14.0 Attachments: HIVE-8114.1.patch {code} select log (2, 10.5BD) from src; {code} results in exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8228) CBO: fix couple of issues with partition pruning
[ https://issues.apache.org/jira/browse/HIVE-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8228: --- Fix Version/s: (was: 0.15.0) 0.14.0 CBO: fix couple of issues with partition pruning Key: HIVE-8228 URL: https://issues.apache.org/jira/browse/HIVE-8228 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: 0.14.0 Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-8228.1.patch - Pruner doesn't handle non-deterministic UDFs correctly - Plan genned after CBO has a Project between TScan and Filter; which prevents PartPruning from triggering in hive post CBO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional
[ https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8290: - Status: Patch Available (was: Open) With DbTxnManager configured, all ORC tables forced to be transactional --- Key: HIVE-8290 URL: https://issues.apache.org/jira/browse/HIVE-8290 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8290.patch Currently, once a user configures DbTxnManager to the be transaction manager, all tables that use ORC are expected to be transactional. This means they all have to have buckets. This most likely won't be what users want. We need to add a specific mark to a table so that users can indicate it should be treated in a transactional way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional
[ https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8290: - Attachment: HIVE-8290.patch This patch changes the SemanticAnalyzer to look for a table property transactional before treating a table as requiring transactions. I also added a number of negative tests for things such as making sure the buckets aren't sorted, etc. With DbTxnManager configured, all ORC tables forced to be transactional --- Key: HIVE-8290 URL: https://issues.apache.org/jira/browse/HIVE-8290 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8290.patch Currently, once a user configures DbTxnManager to the be transaction manager, all tables that use ORC are expected to be transactional. This means they all have to have buckets. This most likely won't be what users want. We need to add a specific mark to a table so that users can indicate it should be treated in a transactional way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8266) create function using resource statement compilation should include resource URI entity
[ https://issues.apache.org/jira/browse/HIVE-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-8266: -- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks [~brocknoland] for the review! create function using resource statement compilation should include resource URI entity - Key: HIVE-8266 URL: https://issues.apache.org/jira/browse/HIVE-8266 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.1 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.15.0 Attachments: HIVE-8266.2.patch, HIVE-8266.3.patch The compiler add function name and db name as write entities for create function using resource statement. We should also include the resource URI path in the write entity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8223) CBO Trunk Merge: partition_wise_fileformat2 select result depends on ordering
[ https://issues.apache.org/jira/browse/HIVE-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8223: --- Fix Version/s: (was: 0.15.0) 0.14.0 CBO Trunk Merge: partition_wise_fileformat2 select result depends on ordering - Key: HIVE-8223 URL: https://issues.apache.org/jira/browse/HIVE-8223 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: 0.14.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.14.0 Attachments: HIVE-8223.01.patch, HIVE-8223.02.patch, HIVE-8223.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8199) CBO Trunk Merge: quote2 test fails due to incorrect literal translation
[ https://issues.apache.org/jira/browse/HIVE-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8199: --- Fix Version/s: (was: 0.15.0) 0.14.0 CBO Trunk Merge: quote2 test fails due to incorrect literal translation --- Key: HIVE-8199 URL: https://issues.apache.org/jira/browse/HIVE-8199 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: 0.14.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.14.0 Attachments: HIVE-8199.01.patch, HIVE-8199.02.patch, HIVE-8199.patch Quoting of quotes and slashes is lost in translation back from CBO to AST, it seems -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151944#comment-14151944 ] Hive QA commented on HIVE-8226: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671756/HIVE-8226.03.patch {color:green}SUCCESS:{color} +1 6363 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1038/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1038/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1038/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12671756 Vectorize dynamic partitioning in VectorFileSinkOperator Key: HIVE-8226 URL: https://issues.apache.org/jira/browse/HIVE-8226 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, HIVE-8226.03.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8111) CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO
[ https://issues.apache.org/jira/browse/HIVE-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8111: --- Fix Version/s: (was: 0.15.0) 0.14.0 CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO Key: HIVE-8111 URL: https://issues.apache.org/jira/browse/HIVE-8111 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: 0.14.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.14.0 Attachments: HIVE-8111.01.patch, HIVE-8111.02.patch, HIVE-8111.03.patch, HIVE-8111.patch Original test failure: looks like column type changes to different decimals in most cases. In one case it causes the integer part to be too big to fit, so the result becomes null it seems. What happens is that CBO adds casts to arithmetic expressions to make them type compatible; these casts become part of new AST, and then Hive adds casts on top of these casts. This (the first part) also causes lots of out file changes. It's not clear how to best fix it so far, in addition to incorrect decimal width and sometimes nulls when width is larger than allowed in Hive. Option one - don't add those for numeric ops - cannot be done if numeric op is a part of compare, for which CBO needs correct types. Option two - unwrap casts when determining type in Hive - hard or impossible to tell apart CBO-added casts and user casts. Option three - don't change types in Hive if CBO has run - seems hacky and hard to ensure it's applied everywhere. Option four - map all expressions precisely between two trees and remove casts again after optimization, will be pretty difficult. Option five - somehow mark those casts. Not sure about how yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader
[ https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8291: -- Assignee: Prasanth J (was: Alan Gates) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader Key: HIVE-8291 URL: https://issues.apache.org/jira/browse/HIVE-8291 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Prasanth J Fix For: 0.14.0 When loading into a partitioned bucketed sorted table the query fails with {code} Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0] for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for client [172.21.128.111], because this file is already being created by [DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on [172.21.128.122] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy15.create(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy15.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390) at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394) at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1966) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1983) at
[jira] [Created] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader
Mostafa Mokhtar created HIVE-8291: - Summary: Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader Key: HIVE-8291 URL: https://issues.apache.org/jira/browse/HIVE-8291 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Alan Gates Fix For: 0.14.0 When loading into a partitioned bucketed sorted table the query fails with {code} Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0] for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for client [172.21.128.111], because this file is already being created by [DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on [172.21.128.122] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy15.create(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy15.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390) at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394) at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1966) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1983) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2287) at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.close(OrcRecordUpdater.java:356) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriter {code} DDL {code} CREATE TABLE
[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader
[ https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8291: -- Description: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options) 1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String) 621 18.169 {code} was: When loading into a partitioned bucketed sorted table the query fails with {code} Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0] for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for client [172.21.128.111], because this file is already being created by [DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on [172.21.128.122] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy15.create(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy15.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465) at
[jira] [Commented] (HIVE-8270) JDBC uber jar is missing some classes required in secure setup.
[ https://issues.apache.org/jira/browse/HIVE-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151962#comment-14151962 ] Ashutosh Chauhan commented on HIVE-8270: LGTM +1 As Thejas pointed out, we should clarify in doc that this is meant for remote HS2, not for embedded one. JDBC uber jar is missing some classes required in secure setup. --- Key: HIVE-8270 URL: https://issues.apache.org/jira/browse/HIVE-8270 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-8270.1.patch JDBC uber jar is missing some required classes for a secure setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25497: HIVE-7627, FSStatsPublisher does fit into Spark multi-thread task mode
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25497/#review54837 --- Ship it! Ship It! - Xuefu Zhang On Sept. 28, 2014, 9:50 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25497/ --- (Updated Sept. 28, 2014, 9:50 a.m.) Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-7627 https://issues.apache.org/jira/browse/HIVE-7627 Repository: hive-git Description --- Hive table statistic failed on FSStatsPublisher mode because of missing mapred.task.patition parameter. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 1674d4b ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 0b8b7c9 Diff: https://reviews.apache.org/r/25497/diff/ Testing --- Thanks, chengxiang li
[jira] [Commented] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151963#comment-14151963 ] Xuefu Zhang commented on HIVE-7627: --- +1 FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch] - Key: HIVE-7627 URL: https://issues.apache.org/jira/browse/HIVE-7627 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: spark-m1 Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch, HIVE-7627.3-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.5-spark.patch, HIVE-7627.5-spark.patch Hive table statistic failed on FSStatsPublisher mode, with the following exception in Spark executor side: {noformat} 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at
[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader
[ https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8291: -- Description: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String) 621 18.169 {code} Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI)655 19.163 java.net.URI.relativize(URI, URI)655 19.163 java.net.URI.normalize(String)517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} was: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7627: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Patch committed to Spark branch. Thanks to Chengxiang for the contribution. FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch] - Key: HIVE-7627 URL: https://issues.apache.org/jira/browse/HIVE-7627 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: spark-m1 Fix For: spark-branch Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch, HIVE-7627.3-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.5-spark.patch, HIVE-7627.5-spark.patch Hive table statistic failed on FSStatsPublisher mode, with the following exception in Spark executor side: {noformat} 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at
[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader
[ https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8291: -- Assignee: Alan Gates (was: Prasanth J) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader Key: HIVE-8291 URL: https://issues.apache.org/jira/browse/HIVE-8291 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Alan Gates Fix For: 0.14.0 Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String)621 18.169 {code} Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI) 655 19.163 java.net.URI.relativize(URI, URI) 655 19.163 java.net.URI.normalize(String) 517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8238) [CBO] Preserve subquery alias while generating ast
[ https://issues.apache.org/jira/browse/HIVE-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8238: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) Non-trivial to fix. HIVE-8245 solves immediate problem of view authorization. [CBO] Preserve subquery alias while generating ast -- Key: HIVE-8238 URL: https://issues.apache.org/jira/browse/HIVE-8238 Project: Hive Issue Type: Bug Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8238.cbo.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7776: -- Attachment: HIVE-7776.3-spark.patch Reattach the same patch to trigger test run. enable sample10.q.[Spark Branch] Key: HIVE-7776 URL: https://issues.apache.org/jira/browse/HIVE-7776 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, HIVE-7776.3-spark.patch, HIVE-7776.3-spark.patch sample10.q contain dynamic partition operation, should enable this qtest after hive on spark support dynamic partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8261) CBO : Predicate pushdown is removed by Optiq
[ https://issues.apache.org/jira/browse/HIVE-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-8261: Attachment: HIVE-8261.1.patch CBO : Predicate pushdown is removed by Optiq - Key: HIVE-8261 URL: https://issues.apache.org/jira/browse/HIVE-8261 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0, 0.13.1 Reporter: Mostafa Mokhtar Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-8261.1.patch Plan for TPC-DS Q64 wasn't optimal upon looking at the logical plan I realized that predicate pushdown is not applied on date_dim d1. Interestingly before optiq we have the predicate pushed : {code} HiveFilterRel(condition=[=($5, $1)]) HiveJoinRel(condition=[=($3, $6)], joinType=[inner]) HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], _o__col3=[$1]) HiveFilterRel(condition=[=($0, 2000)]) HiveAggregateRel(group=[{0, 1}], agg#0=[count()], agg#1=[sum($2)]) HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2]) HiveJoinRel(condition=[=($1, $8)], joinType=[inner]) HiveJoinRel(condition=[=($1, $5)], joinType=[inner]) HiveJoinRel(condition=[=($0, $3)], joinType=[inner]) HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], ss_wholesale_cost=[$11]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]]) HiveProjectRel(d_date_sk=[$0], d_year=[$6]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]]) HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), between(false, $1, +(35, 1), +(35, 15)))]) HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], i_color=[$17]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]]) HiveProjectRel(_o__col0=[$0]) HiveAggregateRel(group=[{0}]) HiveProjectRel($f0=[$0]) HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner]) HiveProjectRel(cs_item_sk=[$15], cs_order_number=[$17]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]]) HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]]) HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col3=[$1]) HiveFilterRel(condition=[=($0, +(2000, 1))]) HiveAggregateRel(group=[{0, 1}], agg#0=[count()]) HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2]) HiveJoinRel(condition=[=($1, $8)], joinType=[inner]) HiveJoinRel(condition=[=($1, $5)], joinType=[inner]) HiveJoinRel(condition=[=($0, $3)], joinType=[inner]) HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], ss_wholesale_cost=[$11]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]]) HiveProjectRel(d_date_sk=[$0], d_year=[$6]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]]) HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), between(false, $1, +(35, 1), +(35, 15)))]) HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], i_color=[$17]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]]) HiveProjectRel(_o__col0=[$0]) HiveAggregateRel(group=[{0}]) HiveProjectRel($f0=[$0]) HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner]) HiveProjectRel(cs_item_sk=[$15], cs_order_number=[$17]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]]) HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]]) {code} While after Optiq the filter on date_dim gets pulled up the plan {code} HiveFilterRel(condition=[=($5, $1)]): rowcount = 1.0, cumulative cost = {5.50188454E8 rows, 0.0 cpu, 0.0 io}, id = 6895 HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], _o__col3=[$3], _o__col00=[$4], _o__col10=[$5], _o__col30=[$6]): rowcount = 1.0, cumulative cost = {5.50188454E8 rows, 0.0 cpu, 0.0
[jira] [Updated] (HIVE-8261) CBO : Predicate pushdown is removed by Optiq
[ https://issues.apache.org/jira/browse/HIVE-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-8261: Status: Patch Available (was: Open) CBO : Predicate pushdown is removed by Optiq - Key: HIVE-8261 URL: https://issues.apache.org/jira/browse/HIVE-8261 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.13.1, 0.14.0 Reporter: Mostafa Mokhtar Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-8261.1.patch Plan for TPC-DS Q64 wasn't optimal upon looking at the logical plan I realized that predicate pushdown is not applied on date_dim d1. Interestingly before optiq we have the predicate pushed : {code} HiveFilterRel(condition=[=($5, $1)]) HiveJoinRel(condition=[=($3, $6)], joinType=[inner]) HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], _o__col3=[$1]) HiveFilterRel(condition=[=($0, 2000)]) HiveAggregateRel(group=[{0, 1}], agg#0=[count()], agg#1=[sum($2)]) HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2]) HiveJoinRel(condition=[=($1, $8)], joinType=[inner]) HiveJoinRel(condition=[=($1, $5)], joinType=[inner]) HiveJoinRel(condition=[=($0, $3)], joinType=[inner]) HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], ss_wholesale_cost=[$11]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]]) HiveProjectRel(d_date_sk=[$0], d_year=[$6]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]]) HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), between(false, $1, +(35, 1), +(35, 15)))]) HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], i_color=[$17]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]]) HiveProjectRel(_o__col0=[$0]) HiveAggregateRel(group=[{0}]) HiveProjectRel($f0=[$0]) HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner]) HiveProjectRel(cs_item_sk=[$15], cs_order_number=[$17]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]]) HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]]) HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col3=[$1]) HiveFilterRel(condition=[=($0, +(2000, 1))]) HiveAggregateRel(group=[{0, 1}], agg#0=[count()]) HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2]) HiveJoinRel(condition=[=($1, $8)], joinType=[inner]) HiveJoinRel(condition=[=($1, $5)], joinType=[inner]) HiveJoinRel(condition=[=($0, $3)], joinType=[inner]) HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], ss_wholesale_cost=[$11]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]]) HiveProjectRel(d_date_sk=[$0], d_year=[$6]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]]) HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), between(false, $1, +(35, 1), +(35, 15)))]) HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], i_color=[$17]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]]) HiveProjectRel(_o__col0=[$0]) HiveAggregateRel(group=[{0}]) HiveProjectRel($f0=[$0]) HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner]) HiveProjectRel(cs_item_sk=[$15], cs_order_number=[$17]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]]) HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16]) HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]]) {code} While after Optiq the filter on date_dim gets pulled up the plan {code} HiveFilterRel(condition=[=($5, $1)]): rowcount = 1.0, cumulative cost = {5.50188454E8 rows, 0.0 cpu, 0.0 io}, id = 6895 HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], _o__col3=[$3], _o__col00=[$4], _o__col10=[$5], _o__col30=[$6]): rowcount = 1.0, cumulative cost = {5.50188454E8 rows, 0.0 cpu,
[jira] [Updated] (HIVE-7971) Support alter table change/replace/add columns for existing partitions
[ https://issues.apache.org/jira/browse/HIVE-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7971: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk and 0.14 branch. Support alter table change/replace/add columns for existing partitions -- Key: HIVE-7971 URL: https://issues.apache.org/jira/browse/HIVE-7971 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.14.0 Attachments: HIVE-7971.1.patch, HIVE-7971.2.patch, HIVE-7971.3.patch ALTER TABLE CHANGE COLUMN is allowed for tables, but not for partitions. Same for add/replace columns. Allowing this for partitions can be useful in some cases. For example, one user has tables with Hive 0.12 Decimal columns, which do not specify precision/scale. To be able to properly read the decimal values from the existing partitions, the column types in the partitions need to be changed to decimal types with precision/scale. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8180) Update SparkReduceRecordHandler for processing the vectors [spark branch]
[ https://issues.apache.org/jira/browse/HIVE-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152002#comment-14152002 ] Xuefu Zhang commented on HIVE-8180: --- Hi [~chinnalalam], the patch looks very good. I just had a very miner comment on RB. Thanks. Update SparkReduceRecordHandler for processing the vectors [spark branch] - Key: HIVE-8180 URL: https://issues.apache.org/jira/browse/HIVE-8180 Project: Hive Issue Type: Bug Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Labels: Spark-M1 Attachments: HIVE-8180-spark.patch, HIVE-8180.1-spark.patch, HIVE-8180.2-spark.patch Update SparkReduceRecordHandler for processing the vectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional
[ https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152008#comment-14152008 ] Vikram Dixit K commented on HIVE-8290: -- +1 for 0.14. With DbTxnManager configured, all ORC tables forced to be transactional --- Key: HIVE-8290 URL: https://issues.apache.org/jira/browse/HIVE-8290 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8290.patch Currently, once a user configures DbTxnManager to the be transaction manager, all tables that use ORC are expected to be transactional. This means they all have to have buckets. This most likely won't be what users want. We need to add a specific mark to a table so that users can indicate it should be treated in a transactional way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID
[ https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152007#comment-14152007 ] Alan Gates commented on HIVE-8231: -- I think I can reproduce the same bug with 2 command line sessions doing things in the following order: # Start session 1 # in session 1 insert into table # start session 2 # in session 2 select * see all rows # in session 1, delete some rows # in session 1 selec *, see less rows # in session 2 select * , see all rows If I stop and restart session 2 after this, than it sees the appropriate number of rows. So either it isn't getting new transaction information for each query in the session, or the results are being cached somewhere on it. Does this match the behavior you're seeing? Error when insert into empty table with ACID Key: HIVE-8231 URL: https://issues.apache.org/jira/browse/HIVE-8231 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Fix For: 0.14.0 Steps to show the bug : 1. create table {code} create table encaissement_1b_64m like encaissement_1b; {code} 2. check table {code} desc encaissement_1b_64m; dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m; {code} everything is ok: {noformat} 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m; +++--+--+ | col_name | data_type | comment | +++--+--+ | id | int| | | idmagasin | int| | | zibzin | string | | | cheque | int| | | montant| double | | | date | timestamp | | | col_6 | string | | | col_7 | string | | | col_8 | string | | +++--+--+ 9 rows selected (0.158 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ +-+--+ No rows selected (0.01 seconds) {noformat} 3. Insert values into the new table {noformat} insert into table encaissement_1b_64m VALUES (1, 1, '8909', 1, 12.5, '12/05/2014', '','',''); {noformat} 4. Check {noformat} 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m; +-+--+ | id | +-+--+ +-+--+ No rows selected (0.091 seconds) {noformat} There are already a pb. I don't see the inserted row. 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder {noformat} 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; +-+--+ | DFS Output | +-+--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0 2014-09-23 12:17 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421 | +-+--+ 2 rows selected (0.014 seconds) {noformat} 6. Doing a major compaction solves the bug {noformat} 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 'major'; No rows affected (0.046 seconds) 0: jdbc:hive2://nc-h04:1/casino dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/; ++--+ | DFS Output | ++--+ | Found 1 items | | drwxr-xr-x - hduser supergroup 0
[jira] [Commented] (HIVE-7843) orc_analyze.q fails due to random mapred.task.id in FileSinkOperator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152012#comment-14152012 ] Xuefu Zhang commented on HIVE-7843: --- Hi [~vkorukanti], would you like to reload the patch to trigger the test run? The build VM were killed in the weekend. orc_analyze.q fails due to random mapred.task.id in FileSinkOperator [Spark Branch] --- Key: HIVE-7843 URL: https://issues.apache.org/jira/browse/HIVE-7843 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Labels: Spark-M1 Fix For: spark-branch Attachments: HIVE-7843.1-spark.patch {code} java.lang.AssertionError: data length is different from num of DP columns org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:809) org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:730) org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829) org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:502) org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:525) org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:198) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27) org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152015#comment-14152015 ] Prasanth J commented on HIVE-8226: -- Committed patch to trunk. I will wait for [~vikram.dixit] to weigh this for branch-0.14 commit. Vectorize dynamic partitioning in VectorFileSinkOperator Key: HIVE-8226 URL: https://issues.apache.org/jira/browse/HIVE-8226 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, HIVE-8226.03.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152019#comment-14152019 ] Vikram Dixit K commented on HIVE-8226: -- +1 for 0.14 Vectorize dynamic partitioning in VectorFileSinkOperator Key: HIVE-8226 URL: https://issues.apache.org/jira/browse/HIVE-8226 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, HIVE-8226.03.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader
[ https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8291: -- Assignee: Owen O'Malley (was: Alan Gates) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader Key: HIVE-8291 URL: https://issues.apache.org/jira/browse/HIVE-8291 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Owen O'Malley Fix For: 0.14.0 Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String)621 18.169 {code} Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI) 655 19.163 java.net.URI.relativize(URI, URI) 655 19.163 java.net.URI.normalize(String) 517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8226: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to branch-0.14 as well. Thanks [~mmccline] and [~vikram.dixit]! Vectorize dynamic partitioning in VectorFileSinkOperator Key: HIVE-8226 URL: https://issues.apache.org/jira/browse/HIVE-8226 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, HIVE-8226.03.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance
[ https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-8196: - Attachment: HIVE-8196.5.patch Not sure why parallel.q is adding and removing POSTHOOK between test runs. Anyways trying again to see if the passes this time. Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance - Key: HIVE-8196 URL: https://issues.apache.org/jira/browse/HIVE-8196 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Blocker Labels: performance Fix For: 0.14.0 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, HIVE-8196.4.patch, HIVE-8196.5.patch To make the best out of dynamic partition pruning joins should be on the partitioning columns which results in dynamically pruning the partitions from the fact table based on the qualifying column keys from the dimension table, this type of joins negatively effects on cardinality estimates with fetch column stats enabled. Currently we don't have statistics for partition columns and as a result NDV is set to row count, doing that negatively affects the estimated join selectivity from the join. Workaround is to capture statistics for partition columns or use number of partitions incase dynamic partitioning is used. In StatsUtils.getColStatisticsFromExpression is where count distincts gets set to row count {code} if (encd.getIsPartitionColOrVirtualCol()) { // vitual columns colType = encd.getTypeInfo().getTypeName(); countDistincts = numRows; oi = encd.getWritableObjectInspector(); {code} Query used to repro the issue : {code} set hive.stats.fetch.column.stats=true; set hive.tez.dynamic.partition.pruning=true; explain select d_date from store_sales, date_dim where store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 1998; {code} Plan {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 - Map 2 (BROADCAST_EDGE) DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8 Vertices: Map 1 Map Operator Tree: TableScan alias: store_sales filterExpr: ss_sold_date_sk is not null (type: boolean) Statistics: Num rows: 550076554 Data size: 47370018816 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {ss_sold_date_sk} 1 {d_date_sk} {d_date} keys: 0 ss_sold_date_sk (type: int) 1 d_date_sk (type: int) outputColumnNames: _col22, _col26, _col28 input vertices: 1 Map 2 Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (_col22 = _col26) (type: boolean) Statistics: Num rows: 326 Data size: 33252 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col28 (type: string) outputColumnNames: _col0 Statistics: Num rows: 326 Data size: 30644 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 326 Data size: 30644 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: date_dim filterExpr: (d_date_sk is not null and (d_year = 1998)) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator
[jira] [Updated] (HIVE-8291) ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader
[ https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8291: -- Summary: ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader (was: Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader) ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader --- Key: HIVE-8291 URL: https://issues.apache.org/jira/browse/HIVE-8291 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Owen O'Malley Fix For: 0.14.0 Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String)621 18.169 {code} Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI) 655 19.163 java.net.URI.relativize(URI, URI) 655 19.163 java.net.URI.normalize(String) 517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} -- This message was sent by Atlassian
[jira] [Created] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
Mostafa Mokhtar created HIVE-8292: - Summary: Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp Key: HIVE-8292 URL: https://issues.apache.org/jira/browse/HIVE-8292 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Owen O'Malley Fix For: 0.14.0 Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String) 621 18.169 {code} Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI)655 19.163 java.net.URI.relativize(URI, URI)655 19.163 java.net.URI.normalize(String)517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8292: -- Assignee: Prasanth J (was: Owen O'Malley) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp --- Key: HIVE-8292 URL: https://issues.apache.org/jira/browse/HIVE-8292 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Prasanth J Fix For: 0.14.0 Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String)621 18.169 {code} Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI) 655 19.163 java.net.URI.relativize(URI, URI) 655 19.163 java.net.URI.normalize(String) 517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8292: -- Description: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI)655 19.163 java.net.URI.relativize(URI, URI)655 19.163 java.net.URI.normalize(String)517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} was: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String) 621 18.169 {code} Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI)655 19.163
[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8292: -- Attachment: 2014_09_29_14_46_04.jfr Hot function profile Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp --- Key: HIVE-8292 URL: https://issues.apache.org/jira/browse/HIVE-8292 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Prasanth J Fix For: 0.14.0 Attachments: 2014_09_29_14_46_04.jfr Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI) 655 19.163 java.net.URI.relativize(URI, URI) 655 19.163 java.net.URI.normalize(String) 517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8291) ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader
[ https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8291: -- Attachment: 2014_09_28_16_48_48.jfr Hot function profile. Use Java mission control (jmc) to open the file, JMC is part of Java 7. ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader --- Key: HIVE-8291 URL: https://issues.apache.org/jira/browse/HIVE-8291 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Owen O'Malley Fix For: 0.14.0 Attachments: 2014_09_28_16_48_48.jfr Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String)621 18.169 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8291) ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader
[ https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8291: -- Description: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String) 621 18.169 {code} was: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String) 621 18.169 {code} Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI)655 19.163 java.net.URI.relativize(URI, URI)655 19.163 java.net.URI.normalize(String)517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74
[jira] [Created] (HIVE-8293) Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number
Selina Zhang created HIVE-8293: -- Summary: Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number Key: HIVE-8293 URL: https://issues.apache.org/jira/browse/HIVE-8293 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Selina Zhang Assignee: Selina Zhang The direct SQL route of retrieve partition objects through filters failed for Oracle. Similar as DERBY-6358, Oracle tries to cast PART_KEY_VALUE in PARTITION_KEY_VALs table to decimal before evaluate the condition. Here is the stack trace: {quote} 2014-09-29 18:53:53,490 ERROR [pool-1-thread-1] metastore.ObjectStore (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back to ORM javax.jdo.JDODataStoreException: Error executing SQL query select PARTITIONS.PART_ID from PARTITIONS inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = ? inner join DBS on TBLS.DB_ID = DBS.DB_ID and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 where (((case when TBLS.TBL_NAME = ? and DBS.NAME = ? then cast(FILTER0.PART_KEY_VAL as decimal(21,0)) else null end) ?)). at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422) at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211) at org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920) at org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914) at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98) at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) NestedThrowablesStackTrace: java.sql.SQLSyntaxErrorException: ORA-01722: invalid number {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8293) Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number
[ https://issues.apache.org/jira/browse/HIVE-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152087#comment-14152087 ] Selina Zhang commented on HIVE-8293: It is easy to reproduce: {block} hive create table a (col string) partitioned by (dt string); hive create table b (col string) partitioned by (idx int); hive alter table a add partition(dt='20140808'); hive alter table b add partition(idx=50); hive select * from b where idx 10; {block} Metastore direct SQL failed for Oracle becasue ORA-01722: invalid number Key: HIVE-8293 URL: https://issues.apache.org/jira/browse/HIVE-8293 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Selina Zhang Assignee: Selina Zhang The direct SQL route of retrieve partition objects through filters failed for Oracle. Similar as DERBY-6358, Oracle tries to cast PART_KEY_VALUE in PARTITION_KEY_VALs table to decimal before evaluate the condition. Here is the stack trace: {quote} 2014-09-29 18:53:53,490 ERROR [pool-1-thread-1] metastore.ObjectStore (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back to ORM javax.jdo.JDODataStoreException: Error executing SQL query select PARTITIONS.PART_ID from PARTITIONS inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME = ? inner join DBS on TBLS.DB_ID = DBS.DB_ID and DBS.NAME = ? inner join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 where (((case when TBLS.TBL_NAME = ? and DBS.NAME = ? then cast(FILTER0.PART_KEY_VAL as decimal(21,0)) else null end) ?)). at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:422) at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:300) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:211) at org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1920) at org.apache.hadoop.hive.metastore.ObjectStore$3.getSqlResult(ObjectStore.java:1914) at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:1914) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:1887) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98) at com.sun.proxy.$Proxy8.getPartitionsByExpr(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_expr(HiveMetaStore.java:3800) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9366) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_expr.getResult(ThriftHiveMetastore.java:9350) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:617) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:613) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:613) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody0(TThreadPoolServer.java:206) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run_aroundBody1$advice(TThreadPoolServer.java:101) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:1) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) NestedThrowablesStackTrace: