date:20140929


 [ 
https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8186:

Attachment: HIVE-8186.2.patch.txt

 Self join may fail if one side has VCs and other doesn't
 

 Key: HIVE-8186
 URL: https://issues.apache.org/jira/browse/HIVE-8186
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt


 See comments. This also fails on trunk, although not on original join_vc query



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8186) Self join may fail if one side has VCs and other doesn't


 [ 
https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8186:

Attachment: (was: HIVE-8186.2.patch.txt)

 Self join may fail if one side has VCs and other doesn't
 

 Key: HIVE-8186
 URL: https://issues.apache.org/jira/browse/HIVE-8186
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt


 See comments. This also fails on trunk, although not on original join_vc query



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8186) Self join may fail if one side has VCs and other doesn't


[ 
https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151389#comment-14151389
 ] 

Navis commented on HIVE-8186:
-

[~sershe] If you are busy, can I take this?

 Self join may fail if one side has VCs and other doesn't
 

 Key: HIVE-8186
 URL: https://issues.apache.org/jira/browse/HIVE-8186
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt


 See comments. This also fails on trunk, although not on original join_vc query



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8283) Missing break in FilterSelectivityEstimator#visitCall()


 [ 
https://issues.apache.org/jira/browse/HIVE-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8283:

Attachment: HIVE-8283.1.patch.txt

 Missing break in FilterSelectivityEstimator#visitCall()
 ---

 Key: HIVE-8283
 URL: https://issues.apache.org/jira/browse/HIVE-8283
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
 Attachments: HIVE-8283.1.patch.txt


 {code}
 case NOT_EQUALS: {
   selectivity = computeNotEqualitySelectivity(call);
 }
 case LESS_THAN_OR_EQUAL:
 case GREATER_THAN_OR_EQUAL:
 case LESS_THAN:
 case GREATER_THAN: {
   selectivity = ((double) 1 / (double) 3);
   break;
 }
 {code}
 break is missing for NOT_EQUALS case. selectivity would be overwritten with 
 1/3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8283) Missing break in FilterSelectivityEstimator#visitCall()


 [ 
https://issues.apache.org/jira/browse/HIVE-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8283:

Status: Patch Available  (was: Open)

Seemed need update of some query results.

 Missing break in FilterSelectivityEstimator#visitCall()
 ---

 Key: HIVE-8283
 URL: https://issues.apache.org/jira/browse/HIVE-8283
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
 Attachments: HIVE-8283.1.patch.txt


 {code}
 case NOT_EQUALS: {
   selectivity = computeNotEqualitySelectivity(call);
 }
 case LESS_THAN_OR_EQUAL:
 case GREATER_THAN_OR_EQUAL:
 case LESS_THAN:
 case GREATER_THAN: {
   selectivity = ((double) 1 / (double) 3);
   break;
 }
 {code}
 break is missing for NOT_EQUALS case. selectivity would be overwritten with 
 1/3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance


[ 
https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151395#comment-14151395
 ] 

Hive QA commented on HIVE-8196:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671731/HIVE-8196.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6362 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1031/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1031/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1031/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671731

 Joining on partition columns with fetch column stats enabled results it very 
 small CE which negatively affects query performance 
 -

 Key: HIVE-8196
 URL: https://issues.apache.org/jira/browse/HIVE-8196
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
Priority: Blocker
  Labels: performance
 Fix For: 0.14.0

 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch


 To make the best out of dynamic partition pruning joins should be on the 
 partitioning columns which results in dynamically pruning the partitions from 
 the fact table based on the qualifying column keys from the dimension table, 
 this type of joins negatively effects on cardinality estimates with fetch 
 column stats enabled.
 Currently we don't have statistics for partition columns and as a result NDV 
 is set to row count, doing that negatively affects the estimated join 
 selectivity from the join.
 Workaround is to capture statistics for partition columns or use number of 
 partitions incase dynamic partitioning is used.
 In StatsUtils.getColStatisticsFromExpression is where count distincts gets 
 set to row count 
 {code}
   if (encd.getIsPartitionColOrVirtualCol()) {
 // vitual columns
 colType = encd.getTypeInfo().getTypeName();
 countDistincts = numRows;
 oi = encd.getWritableObjectInspector();
 {code}
 Query used to repro the issue :
 {code}
 set hive.stats.fetch.column.stats=true;
 set hive.tez.dynamic.partition.pruning=true;
 explain select d_date 
 from store_sales, date_dim 
 where 
 store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
 date_dim.d_year = 1998;
 {code}
 Plan 
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 2 (BROADCAST_EDGE)
   DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_sold_date_sk is not null (type: boolean)
   Statistics: Num rows: 550076554 Data size: 47370018816 
 Basic stats: COMPLETE Column stats: COMPLETE
   Map Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {ss_sold_date_sk}
   1 {d_date_sk} {d_date}
 keys:
   0 ss_sold_date_sk (type: int)
   1 d_date_sk (type: int)
 outputColumnNames: _col22, _col26, _col28
 input vertices:
   1 Map 2
 Statistics: Num rows: 652 Data size: 66504 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Filter Operator
   predicate: (_col22 = _col26) (type: boolean)
   Statistics: Num rows: 326 Data size: 33252 Basic stats: 
 COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: _col28 (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 326 Data size: 30644 Basic 
 stats: COMPLETE Column stats: COMPLETE
 File Output Operator

[jira] [Commented] (HIVE-8267) Exposing hbase cell latest timestamp through hbase columns mappings to hive columns.


[ 
https://issues.apache.org/jira/browse/HIVE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151403#comment-14151403
 ] 

Navis commented on HIVE-8267:
-

[~ehsan] HIVE-2781 is applied long ago and HIVE-2828 is not broken (can have 
restricted feature). Could I ask the reason why you stated like that?

 Exposing hbase cell latest timestamp through hbase columns mappings to hive 
 columns.
 

 Key: HIVE-8267
 URL: https://issues.apache.org/jira/browse/HIVE-8267
 Project: Hive
  Issue Type: New Feature
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Muhammad Ehsan ul Haque
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-8267.0.patch


 Previous attempts HIVE-2781 (not accepted), HIVE-2828 (broken and proposed 
 with restricted feature).
 The feature is to have hbase cell latest timestamp accessible in hive query, 
 by mapping the cell timestamp with a hive column, using mapping format like 
 {code}:timestamp:cf:[optional qualifier or qualifier prefix]{code}
 The hive create table statement would be like
 h4. For mapping a cell latest timestamp.
 {code}
 CREATE TABLE hive_hbase_table (key STRING, col1 STRING, col1_ts BIGINT)
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:qualifier, 
 :timestamp:cf:qualifier)
 TBLPROPERTIES (hbase.table.name = hbase_table);
 {code}
 h4. For mapping a column family latest timestamp.
 {code}
 CREATE TABLE hive_hbase_table (key STRING, valuemap MAPSTRING, STRING, 
 timestampmap MAPSTRING, BIGINT)
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:,:timestamp:cf:)
 TBLPROPERTIES (hbase.table.name = hbase_table);
 {code}
 h4. Providing default cell value
 {code}
 CREATE TABLE hive_hbase_table(key int, value string, value_timestamp bigint)
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = cf:qualifier, 
 :timestamp:cf:qualifier,
   hbase.put.default.cell.value = default value)
 TBLPROPERTIES (hbase.table.name = hbase_table);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8288) HiveServer2 dynamic discovery should create znodes organized by version number

2014-09-29 Thread Vaibhav Gumashta (JIRA)

Vaibhav Gumashta created HIVE-8288:
--

 Summary: HiveServer2 dynamic discovery should create znodes 
organized by version number
 Key: HIVE-8288
 URL: https://issues.apache.org/jira/browse/HIVE-8288
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0


Something like: /hiveserver2/version_no/znode_name would be better to 
support admin actions like removing all znodes for a particular version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8180) Update SparkReduceRecordHandler for processing the vectors [spark branch]

2014-09-29 Thread Chinna Rao Lalam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151412#comment-14151412
 ] 

Chinna Rao Lalam commented on HIVE-8180:


RB link :  https://reviews.apache.org/r/26130/

 Update SparkReduceRecordHandler for processing the vectors [spark branch]
 -

 Key: HIVE-8180
 URL: https://issues.apache.org/jira/browse/HIVE-8180
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
  Labels: Spark-M1
 Attachments: HIVE-8180-spark.patch, HIVE-8180.1-spark.patch, 
 HIVE-8180.2-spark.patch


 Update SparkReduceRecordHandler for processing the vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8265) Build failure on hadoop-1


[ 
https://issues.apache.org/jira/browse/HIVE-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151413#comment-14151413
 ] 

Navis commented on HIVE-8265:
-

Test fail seemed not related to this.

 Build failure on hadoop-1 
 --

 Key: HIVE-8265
 URL: https://issues.apache.org/jira/browse/HIVE-8265
 Project: Hive
  Issue Type: Task
  Components: Tests
Reporter: Navis
Assignee: Navis
Priority: Blocker
 Attachments: HIVE-8265.1.patch.txt


 no pre-commit-tests
 Fails from CustomPartitionVertex and TestHive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance


 [ 
https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8196:
-
Attachment: HIVE-8196.4.patch

Fixes parallel.q test. Rebased patch to latest trunk.

 Joining on partition columns with fetch column stats enabled results it very 
 small CE which negatively affects query performance 
 -

 Key: HIVE-8196
 URL: https://issues.apache.org/jira/browse/HIVE-8196
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
Priority: Blocker
  Labels: performance
 Fix For: 0.14.0

 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, 
 HIVE-8196.4.patch


 To make the best out of dynamic partition pruning joins should be on the 
 partitioning columns which results in dynamically pruning the partitions from 
 the fact table based on the qualifying column keys from the dimension table, 
 this type of joins negatively effects on cardinality estimates with fetch 
 column stats enabled.
 Currently we don't have statistics for partition columns and as a result NDV 
 is set to row count, doing that negatively affects the estimated join 
 selectivity from the join.
 Workaround is to capture statistics for partition columns or use number of 
 partitions incase dynamic partitioning is used.
 In StatsUtils.getColStatisticsFromExpression is where count distincts gets 
 set to row count 
 {code}
   if (encd.getIsPartitionColOrVirtualCol()) {
 // vitual columns
 colType = encd.getTypeInfo().getTypeName();
 countDistincts = numRows;
 oi = encd.getWritableObjectInspector();
 {code}
 Query used to repro the issue :
 {code}
 set hive.stats.fetch.column.stats=true;
 set hive.tez.dynamic.partition.pruning=true;
 explain select d_date 
 from store_sales, date_dim 
 where 
 store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
 date_dim.d_year = 1998;
 {code}
 Plan 
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 2 (BROADCAST_EDGE)
   DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_sold_date_sk is not null (type: boolean)
   Statistics: Num rows: 550076554 Data size: 47370018816 
 Basic stats: COMPLETE Column stats: COMPLETE
   Map Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {ss_sold_date_sk}
   1 {d_date_sk} {d_date}
 keys:
   0 ss_sold_date_sk (type: int)
   1 d_date_sk (type: int)
 outputColumnNames: _col22, _col26, _col28
 input vertices:
   1 Map 2
 Statistics: Num rows: 652 Data size: 66504 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Filter Operator
   predicate: (_col22 = _col26) (type: boolean)
   Statistics: Num rows: 326 Data size: 33252 Basic stats: 
 COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: _col28 (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 326 Data size: 30644 Basic 
 stats: COMPLETE Column stats: COMPLETE
 File Output Operator
   compressed: false
   Statistics: Num rows: 326 Data size: 30644 Basic 
 stats: COMPLETE Column stats: COMPLETE
   table:
   input format: 
 org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   serde: 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 Execution mode: vectorized
 Map 2
 Map Operator Tree:
 TableScan
   alias: date_dim
   filterExpr: (d_date_sk is not null and (d_year = 1998)) 
 (type: boolean)
   Statistics: Num rows: 73049 Data size: 81741831 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (d_date_sk is not null and (d_year = 1998)) 
 (type: boolean)

[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator


[ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151450#comment-14151450
 ] 

Prasanth J commented on HIVE-8226:
--

[~mmccline] Can you rebase the patch against current trunk? I see failure when 
I tried to commit this patch. There is diff in golden file when I ran 
dynpart_sort_opt_vectorization.q test. Also patch did not apply cleanly on 
trunk. Also is this going into branch-0.14 as well? If so please check with 
[~vikram.dixit] and make changes to Affects and Fix versions accordingly.

 Vectorize dynamic partitioning in VectorFileSinkOperator
 

 Key: HIVE-8226
 URL: https://issues.apache.org/jira/browse/HIVE-8226
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message


 [ 
https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-8287:

Status: Patch Available  (was: Open)

 StorageBasedAuth in metastore does not produce useful error message
 ---

 Key: HIVE-8287
 URL: https://issues.apache.org/jira/browse/HIVE-8287
 Project: Hive
  Issue Type: Bug
  Components: Authorization, Logging
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-8287.1.patch


 Example of error message that doesn't given enough useful information -
 {noformat}
 0: jdbc:hive2://localhost:1 alter table parttab1 drop partition 
 (p1='def');
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check 
 logs. (state=08S01,code=1)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message


 [ 
https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-8287:

Attachment: HIVE-8287.1.patch

 StorageBasedAuth in metastore does not produce useful error message
 ---

 Key: HIVE-8287
 URL: https://issues.apache.org/jira/browse/HIVE-8287
 Project: Hive
  Issue Type: Bug
  Components: Authorization, Logging
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-8287.1.patch


 Example of error message that doesn't given enough useful information -
 {noformat}
 0: jdbc:hive2://localhost:1 alter table parttab1 drop partition 
 (p1='def');
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check 
 logs. (state=08S01,code=1)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator


 [ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8226:
---
Status: In Progress  (was: Patch Available)

 Vectorize dynamic partitioning in VectorFileSinkOperator
 

 Key: HIVE-8226
 URL: https://issues.apache.org/jira/browse/HIVE-8226
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator


 [ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8226:
---
Attachment: HIVE-8226.03.patch

 Vectorize dynamic partitioning in VectorFileSinkOperator
 

 Key: HIVE-8226
 URL: https://issues.apache.org/jira/browse/HIVE-8226
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
 HIVE-8226.03.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator


 [ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8226:
---
Status: Patch Available  (was: In Progress)

 Vectorize dynamic partitioning in VectorFileSinkOperator
 

 Key: HIVE-8226
 URL: https://issues.apache.org/jira/browse/HIVE-8226
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
 HIVE-8226.03.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator


[ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151468#comment-14151468
 ] 

Matt McCline commented on HIVE-8226:


Yes, I rebased and re-ran the dynpart_sort_opt_vectorization.q and found a few 
stages now vectorize...  Perhaps I didn't create patch #2 correctly.  Anyway, 
submitted patch #3.

 Vectorize dynamic partitioning in VectorFileSinkOperator
 

 Key: HIVE-8226
 URL: https://issues.apache.org/jira/browse/HIVE-8226
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
 HIVE-8226.03.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity


[ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151469#comment-14151469
 ] 

Hive QA commented on HIVE-7723:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671737/HIVE-7723.8.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6364 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_escape1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_escape2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1032/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1032/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1032/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671737

 Explain plan for complex query with lots of partitions is slow due to 
 in-efficient collection used to find a matching ReadEntity
 

 Key: HIVE-7723
 URL: https://issues.apache.org/jira/browse/HIVE-7723
 Project: Hive
  Issue Type: Bug
  Components: CLI, Physical Optimizer
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-7723.1.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, 
 HIVE-7723.4.patch, HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, 
 HIVE-7723.8.patch


 Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
 showed that ReadEntity.equals is taking ~40% of the CPU.
 ReadEntity.equals is called from the snippet below.
 Again and again the set is iterated over to get the actual match, a HashMap 
 is a better option for this case as Set doesn't have a Get method.
 Also for ReadEntity equals is case-insensitive while hash is , which is an 
 undesired behavior.
 {code}
 public static ReadEntity addInput(SetReadEntity inputs, ReadEntity 
 newInput) {
 // If the input is already present, make sure the new parent is added to 
 the input.
 if (inputs.contains(newInput)) {
   for (ReadEntity input : inputs) {
 if (input.equals(newInput)) {
   if ((newInput.getParents() != null)  
 (!newInput.getParents().isEmpty())) {
 input.getParents().addAll(newInput.getParents());
 input.setDirect(input.isDirect() || newInput.isDirect());
   }
   return input;
 }
   }
   assert false;
 } else {
   inputs.add(newInput);
   return newInput;
 }
 // make compile happy
 return null;
   }
 {code}
 This is the query used : 
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN

[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator


 [ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8226:
---
Affects Version/s: 0.14.0

 Vectorize dynamic partitioning in VectorFileSinkOperator
 

 Key: HIVE-8226
 URL: https://issues.apache.org/jira/browse/HIVE-8226
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
 HIVE-8226.03.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator


 [ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-8226:
---
Fix Version/s: 0.14.0

 Vectorize dynamic partitioning in VectorFileSinkOperator
 

 Key: HIVE-8226
 URL: https://issues.apache.org/jira/browse/HIVE-8226
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
 HIVE-8226.03.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator


[ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151472#comment-14151472
 ] 

Matt McCline commented on HIVE-8226:


~pjayachandran I added you to e-mail I sent to Gunther about branch-0.14

 Vectorize dynamic partitioning in VectorFileSinkOperator
 

 Key: HIVE-8226
 URL: https://issues.apache.org/jira/browse/HIVE-8226
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
 HIVE-8226.03.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message


 [ 
https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-8287:

Attachment: HIVE-8287.2.patch

 StorageBasedAuth in metastore does not produce useful error message
 ---

 Key: HIVE-8287
 URL: https://issues.apache.org/jira/browse/HIVE-8287
 Project: Hive
  Issue Type: Bug
  Components: Authorization, Logging
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-8287.1.patch, HIVE-8287.2.patch


 Example of error message that doesn't given enough useful information -
 {noformat}
 0: jdbc:hive2://localhost:1 alter table parttab1 drop partition 
 (p1='def');
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check 
 logs. (state=08S01,code=1)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message


[ 
https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151488#comment-14151488
 ] 

Thejas M Nair commented on HIVE-8287:
-

HIVE-8287.2.patch - also includes changes to webhcat e2e tests for new error 
messages, and for changes in HIVE-8221 .



 StorageBasedAuth in metastore does not produce useful error message
 ---

 Key: HIVE-8287
 URL: https://issues.apache.org/jira/browse/HIVE-8287
 Project: Hive
  Issue Type: Bug
  Components: Authorization, Logging
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-8287.1.patch, HIVE-8287.2.patch


 Example of error message that doesn't given enough useful information -
 {noformat}
 0: jdbc:hive2://localhost:1 alter table parttab1 drop partition 
 (p1='def');
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check 
 logs. (state=08S01,code=1)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7685) Parquet memory manager

2014-09-29 Thread Dong Chen (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151496#comment-14151496
]

Dong Chen commented on HIVE-7685:
-

Hi Brock,

I think a brief design for this memory manager is:
Every new writer registers itself to the manager. The manager has an overall
view of all the writers. When a condition is up (such as every 1000 rows), it
will notify the writers to check memory usage and flush if necessary.

However, a problem for Parquet specifically is: Hive only has a wrapper for the
ParquetRecordWriter, and even ParquetRecordWriter also wrap the real writer
(InternalParquetRecordWriter) in Parquet project. Since the behaviors of
measuring dynamic buffer size and flushing are private in the real writer, I
think we also have to add code in InternalParquetRecordWriter to implement the
memory manager functionality.

It seems only changing Hive code cannot fix this Jira.
Not sure whether we should put this problem in Parquet project and fix it
there, if it is generic enough and not Hive specific?

Any other ideas?

Best Regards,
Dong

Parquet memory manager
--

Key: HIVE-7685
URL: https://issues.apache.org/jira/browse/HIVE-7685
Project: Hive
Issue Type: Improvement
Components: Serializers/Deserializers
Reporter: Brock Noland

Similar to HIVE-4248, Parquet tries to write large very large row groups.
This causes Hive to run out of memory during dynamic partitions when a
reducer may have many Parquet files open at a given time.
As such, we should implement a memory manager which ensures that we don't run
out of memory due to writing too many row groups within a single JVM.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8222) CBO Trunk Merge: Fix Check Style issues

2014-09-29 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151498#comment-14151498
 ] 

Lars Francke commented on HIVE-8222:


Would anyone mind taking a look? Shall I open a review?
This one will probably go stale very fast so I'd appreciate a quick turnaround 
to avoid a lot of extra work.

 CBO Trunk Merge: Fix Check Style issues
 ---

 Key: HIVE-8222
 URL: https://issues.apache.org/jira/browse/HIVE-8222
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-8222.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7776:

Attachment: HIVE-7776.2-spark.patch

add several MR configuration legacy to enable hive features which based on 
mapred.task.id/mapreduce.task.attempt.id/mapred.task.partition. 

 enable sample10.q.[Spark Branch]
 

 Key: HIVE-7776
 URL: https://issues.apache.org/jira/browse/HIVE-7776
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch


 sample10.q contain dynamic partition operation, should enable this qtest 
 after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25495: HIVE-7776, enable sample10.q

2014-09-29 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25495/
---

(Updated 九月 29, 2014, 9:10 a.m.)


Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-7776
https://issues.apache.org/jira/browse/HIVE-7776


Repository: hive-git


Description (updated)
---

Hive get task Id through 2 ways in Utilities::getTaskId:
get parameter value of mapred.task.id from configuration.
generate random value while #1 return null.
set mapred.task.id on executor side as we can build it through TaskContext now.


Diffs
-

  itests/src/test/resources/testconfiguration.properties 155abad 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 3ff0782 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 02f9d99 
  ql/src/test/results/clientpositive/spark/sample10.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/25495/diff/


Testing
---


Thanks,

chengxiang li

Re: Review Request 25495: HIVE-7776, enable sample10.q

2014-09-29 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25495/
---

(Updated 九月 29, 2014, 9:11 a.m.)


Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-7776
https://issues.apache.org/jira/browse/HIVE-7776


Repository: hive-git


Description
---

Hive get task Id through 2 ways in Utilities::getTaskId:
get parameter value of mapred.task.id from configuration.
generate random value while #1 return null.
set mapred.task.id on executor side as we can build it through TaskContext now.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 89243fc 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 1674d4b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HivePairFlatMapFunction.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
0b8b7c9 
  ql/src/test/results/clientpositive/spark/sample10.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/25495/diff/


Testing
---


Thanks,

chengxiang li

Re: Review Request 25495: HIVE-7776, enable sample10.q

2014-09-29 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25495/
---

(Updated 九月 29, 2014, 9:13 a.m.)


Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-7776
https://issues.apache.org/jira/browse/HIVE-7776


Repository: hive-git


Description
---

Hive get task Id through 2 ways in Utilities::getTaskId:
get parameter value of mapred.task.id from configuration.
generate random value while #1 return null.
set mapred.task.id on executor side as we can build it through TaskContext now.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 89243fc 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 1674d4b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HivePairFlatMapFunction.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
0b8b7c9 
  ql/src/test/results/clientpositive/spark/sample10.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/25495/diff/


Testing
---


Thanks,

chengxiang li

[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7776:

Attachment: HIVE-7776.3-spark.patch

 enable sample10.q.[Spark Branch]
 

 Key: HIVE-7776
 URL: https://issues.apache.org/jira/browse/HIVE-7776
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, 
 HIVE-7776.3-spark.patch


 sample10.q contain dynamic partition operation, should enable this qtest 
 after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7776:

Status: Patch Available  (was: Open)

 enable sample10.q.[Spark Branch]
 

 Key: HIVE-7776
 URL: https://issues.apache.org/jira/browse/HIVE-7776
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, 
 HIVE-7776.3-spark.patch


 sample10.q contain dynamic partition operation, should enable this qtest 
 after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7776) enable sample10.q.[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151533#comment-14151533
 ] 

Hive QA commented on HIVE-7776:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671768/HIVE-7776.3-spark.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/171/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/171/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-171/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-SPARK-Build-171/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-spark-source ]]
+ [[ ! -d apache-svn-spark-source/.svn ]]
+ [[ ! -d apache-svn-spark-source ]]
+ cd apache-svn-spark-source
+ svn revert -R .
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java'
++ svn status --no-ignore
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
+ rm -rf target datanucleus.log ant/target shims/0.20/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/common-secure/target metastore/target common/target common/src/gen 
serde/target ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1628143.

At revision 1628143.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671768

 enable sample10.q.[Spark Branch]
 

 Key: HIVE-7776
 URL: https://issues.apache.org/jira/browse/HIVE-7776
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, 
 HIVE-7776.3-spark.patch


 sample10.q contain dynamic partition operation, should enable this qtest 
 after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7776) enable sample10.q.[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151539#comment-14151539
 ] 

Chengxiang Li commented on HIVE-7776:
-

This patch depends on HIVE-7627, I should re-upload it after HIVE-7627 has been 
committed.

 enable sample10.q.[Spark Branch]
 

 Key: HIVE-7776
 URL: https://issues.apache.org/jira/browse/HIVE-7776
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, 
 HIVE-7776.3-spark.patch


 sample10.q contain dynamic partition operation, should enable this qtest 
 after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-2573) Create per-session function registry


[ 
https://issues.apache.org/jira/browse/HIVE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151555#comment-14151555
 ] 

Hive QA commented on HIVE-2573:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671743/HIVE-2573.4.patch.txt

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 6365 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_func1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_functions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_collect_set
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_corr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_pop
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_samp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_avg
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_count
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_min
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_percentile
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_std
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_stddev
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_stddev_samp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_sum
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_var_pop
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_var_samp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_variance
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_max
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udaf_example_min
org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.testNegativeCliDriver_invalid_row_sequence
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.service.TestHiveServerSessions.testSessionFuncs
org.apache.hive.jdbc.TestJdbcDriver2.testGetQueryLog
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1033/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1033/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1033/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671743

 Create per-session function registry 
 -

 Key: HIVE-2573
 URL: https://issues.apache.org/jira/browse/HIVE-2573
 Project: Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2573.D3231.1.patch, 
 HIVE-2573.1.patch.txt, HIVE-2573.2.patch.txt, HIVE-2573.3.patch.txt, 
 HIVE-2573.4.patch.txt


 Currently the function registry is shared resource and could be overrided by 
 other users when using HiveServer. If per-session function registry is 
 provided, this situation could be prevented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8186) Self join may fail if one side has VCs and other doesn't


[ 
https://issues.apache.org/jira/browse/HIVE-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151597#comment-14151597
 ] 

Hive QA commented on HIVE-8186:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671748/HIVE-8186.2.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6362 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1034/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1034/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1034/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671748

 Self join may fail if one side has VCs and other doesn't
 

 Key: HIVE-8186
 URL: https://issues.apache.org/jira/browse/HIVE-8186
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8186.1.patch.txt, HIVE-8186.2.patch.txt


 See comments. This also fails on trunk, although not on original join_vc query



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8267) Exposing hbase cell latest timestamp through hbase columns mappings to hive columns.

2014-09-29 Thread Muhammad Ehsan ul Haque (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151623#comment-14151623
 ] 

Muhammad Ehsan ul Haque commented on HIVE-8267:
---

My bad I just copy pasted from HIVE-2828 description (Originated from HIVE-2781 
and not accepted, but I think this could be helpful to someone). However there 
is one more HIVE-2306 still open, no patch available. 

Hive-2828, has failing test after 2.5 year rebase. Also exposes timestamp by 
picking the timestamp of the first cell only. 
{code}
long timestamp = result.rawCells()[0].getTimestamp();
{code}
Does not allow to expose timestamp of all or particular cells in some column 
families.

 Exposing hbase cell latest timestamp through hbase columns mappings to hive 
 columns.
 

 Key: HIVE-8267
 URL: https://issues.apache.org/jira/browse/HIVE-8267
 Project: Hive
  Issue Type: New Feature
  Components: HBase Handler
Affects Versions: 0.14.0
Reporter: Muhammad Ehsan ul Haque
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-8267.0.patch


 Previous attempts HIVE-2781 (not accepted), HIVE-2828 (broken and proposed 
 with restricted feature).
 The feature is to have hbase cell latest timestamp accessible in hive query, 
 by mapping the cell timestamp with a hive column, using mapping format like 
 {code}:timestamp:cf:[optional qualifier or qualifier prefix]{code}
 The hive create table statement would be like
 h4. For mapping a cell latest timestamp.
 {code}
 CREATE TABLE hive_hbase_table (key STRING, col1 STRING, col1_ts BIGINT)
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:qualifier, 
 :timestamp:cf:qualifier)
 TBLPROPERTIES (hbase.table.name = hbase_table);
 {code}
 h4. For mapping a column family latest timestamp.
 {code}
 CREATE TABLE hive_hbase_table (key STRING, valuemap MAPSTRING, STRING, 
 timestampmap MAPSTRING, BIGINT)
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:,:timestamp:cf:)
 TBLPROPERTIES (hbase.table.name = hbase_table);
 {code}
 h4. Providing default cell value
 {code}
 CREATE TABLE hive_hbase_table(key int, value string, value_timestamp bigint)
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = cf:qualifier, 
 :timestamp:cf:qualifier,
   hbase.put.default.cell.value = default value)
 TBLPROPERTIES (hbase.table.name = hbase_table);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8283) Missing break in FilterSelectivityEstimator#visitCall()


[ 
https://issues.apache.org/jira/browse/HIVE-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151657#comment-14151657
 ] 

Hive QA commented on HIVE-8283:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671749/HIVE-8283.1.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6364 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1035/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1035/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1035/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671749

 Missing break in FilterSelectivityEstimator#visitCall()
 ---

 Key: HIVE-8283
 URL: https://issues.apache.org/jira/browse/HIVE-8283
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
 Attachments: HIVE-8283.1.patch.txt


 {code}
 case NOT_EQUALS: {
   selectivity = computeNotEqualitySelectivity(call);
 }
 case LESS_THAN_OR_EQUAL:
 case GREATER_THAN_OR_EQUAL:
 case LESS_THAN:
 case GREATER_THAN: {
   selectivity = ((double) 1 / (double) 3);
   break;
 }
 {code}
 break is missing for NOT_EQUALS case. selectivity would be overwritten with 
 1/3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces

2014-09-29 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151701#comment-14151701
 ] 

Yongzhi Chen commented on HIVE-8182:


Is trimming very line a good idea? I think to be consistent with single line 
case, maybe only trim last line is better choice. 
My suggestion is add
line = line.trim();
before
if (line.endsWith(;)) { line = line.substring(0, line.length() - 1); }

 beeline fails when executing multiple-line queries with trailing spaces
 ---

 Key: HIVE-8182
 URL: https://issues.apache.org/jira/browse/HIVE-8182
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Sergio Peña
 Fix For: 0.14.0

 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch


 As title indicates, when executing a multi-line query with trailing spaces, 
 beeline reports syntax error: 
 Error: Error while compiling statement: FAILED: ParseException line 1:76 
 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4)
 If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance


[ 
https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151736#comment-14151736
 ] 

Hive QA commented on HIVE-8196:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671753/HIVE-8196.4.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6364 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1036/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1036/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1036/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671753

 Joining on partition columns with fetch column stats enabled results it very 
 small CE which negatively affects query performance 
 -

 Key: HIVE-8196
 URL: https://issues.apache.org/jira/browse/HIVE-8196
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
Priority: Blocker
  Labels: performance
 Fix For: 0.14.0

 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, 
 HIVE-8196.4.patch


 To make the best out of dynamic partition pruning joins should be on the 
 partitioning columns which results in dynamically pruning the partitions from 
 the fact table based on the qualifying column keys from the dimension table, 
 this type of joins negatively effects on cardinality estimates with fetch 
 column stats enabled.
 Currently we don't have statistics for partition columns and as a result NDV 
 is set to row count, doing that negatively affects the estimated join 
 selectivity from the join.
 Workaround is to capture statistics for partition columns or use number of 
 partitions incase dynamic partitioning is used.
 In StatsUtils.getColStatisticsFromExpression is where count distincts gets 
 set to row count 
 {code}
   if (encd.getIsPartitionColOrVirtualCol()) {
 // vitual columns
 colType = encd.getTypeInfo().getTypeName();
 countDistincts = numRows;
 oi = encd.getWritableObjectInspector();
 {code}
 Query used to repro the issue :
 {code}
 set hive.stats.fetch.column.stats=true;
 set hive.tez.dynamic.partition.pruning=true;
 explain select d_date 
 from store_sales, date_dim 
 where 
 store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
 date_dim.d_year = 1998;
 {code}
 Plan 
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 2 (BROADCAST_EDGE)
   DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_sold_date_sk is not null (type: boolean)
   Statistics: Num rows: 550076554 Data size: 47370018816 
 Basic stats: COMPLETE Column stats: COMPLETE
   Map Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {ss_sold_date_sk}
   1 {d_date_sk} {d_date}
 keys:
   0 ss_sold_date_sk (type: int)
   1 d_date_sk (type: int)
 outputColumnNames: _col22, _col26, _col28
 input vertices:
   1 Map 2
 Statistics: Num rows: 652 Data size: 66504 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Filter Operator
   predicate: (_col22 = _col26) (type: boolean)
   Statistics: Num rows: 326 Data size: 33252 Basic stats: 
 COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: _col28 (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 326 Data size: 30644 Basic 
 stats: COMPLETE Column stats: COMPLETE

[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID


[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151757#comment-14151757
 ] 

Damien Carol commented on HIVE-8231:


[~alangates] I made lot of tests in WE. It seems that INSERT/DELETE/UPDATE 
doesn't work at all with concurrency enabled.

If I deactivate ACID with :
{noformat}
!-- concurrency --
property
  namehive.support.concurrency/name
  valuefalse/value
/property

!-- compactor org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager / 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager --
property
  namehive.txn.manager/name
  valueorg.apache.hadoop.hive.ql.lockmgr.DummyTxnManager/value
/property
{noformat}
then everything is ok.

 Error when insert into empty table with ACID
 

 Key: HIVE-8231
 URL: https://issues.apache.org/jira/browse/HIVE-8231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
 Fix For: 0.14.0


 Steps to show the bug :
 1. create table 
 {code}
 create table encaissement_1b_64m like encaissement_1b;
 {code}
 2. check table 
 {code}
 desc encaissement_1b_64m;
 dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
 {code}
 everything is ok:
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m;
   
 +++--+--+
 |  col_name  | data_type  | comment  |
 +++--+--+
 | id | int|  |
 | idmagasin  | int|  |
 | zibzin | string |  |
 | cheque | int|  |
 | montant| double |  |
 | date   | timestamp  |  |
 | col_6  | string |  |
 | col_7  | string |  |
 | col_8  | string |  |
 +++--+--+
 9 rows selected (0.158 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS Output  |
 +-+--+
 +-+--+
 No rows selected (0.01 seconds)
 {noformat}
 3. Insert values into the new table
 {noformat}
 insert into table encaissement_1b_64m VALUES (1, 1, 
 '8909', 1, 12.5, '12/05/2014', '','','');
 {noformat}
 4. Check
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m;
 +-+--+
 | id  |
 +-+--+
 +-+--+
 No rows selected (0.091 seconds)
 {noformat}
 There are already a pb. I don't see the inserted row.
 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS 
 Output  |
 +-+--+
 | Found 1 items   
 |
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
   |
 +-+--+
 2 rows selected (0.014 seconds)
 {noformat}
 6. Doing a major compaction solves the bug
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 
 'major';
 No rows affected (0.046 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 ++--+
 | DFS Output  
|
 ++--+
 | Found 1 items   
|
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:21

[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID


[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151758#comment-14151758
 ] 

Damien Carol commented on HIVE-8231:


To be more precise, this commands works :
{code}
drop table if exists foo6;
create table foo6 (id int) clustered by (id) into 1 buckets;
insert into table foo6 VALUES(1);
select * from foo6;

drop table if exists foo7;
create table foo7 (id int) STORED AS ORC;
insert into table foo7 VALUES(1);
select * from foo7;
{code}

 Error when insert into empty table with ACID
 

 Key: HIVE-8231
 URL: https://issues.apache.org/jira/browse/HIVE-8231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
 Fix For: 0.14.0


 Steps to show the bug :
 1. create table 
 {code}
 create table encaissement_1b_64m like encaissement_1b;
 {code}
 2. check table 
 {code}
 desc encaissement_1b_64m;
 dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
 {code}
 everything is ok:
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m;
   
 +++--+--+
 |  col_name  | data_type  | comment  |
 +++--+--+
 | id | int|  |
 | idmagasin  | int|  |
 | zibzin | string |  |
 | cheque | int|  |
 | montant| double |  |
 | date   | timestamp  |  |
 | col_6  | string |  |
 | col_7  | string |  |
 | col_8  | string |  |
 +++--+--+
 9 rows selected (0.158 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS Output  |
 +-+--+
 +-+--+
 No rows selected (0.01 seconds)
 {noformat}
 3. Insert values into the new table
 {noformat}
 insert into table encaissement_1b_64m VALUES (1, 1, 
 '8909', 1, 12.5, '12/05/2014', '','','');
 {noformat}
 4. Check
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m;
 +-+--+
 | id  |
 +-+--+
 +-+--+
 No rows selected (0.091 seconds)
 {noformat}
 There are already a pb. I don't see the inserted row.
 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS 
 Output  |
 +-+--+
 | Found 1 items   
 |
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
   |
 +-+--+
 2 rows selected (0.014 seconds)
 {noformat}
 6. Doing a major compaction solves the bug
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 
 'major';
 No rows affected (0.046 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 ++--+
 | DFS Output  
|
 ++--+
 | Found 1 items   
|
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:21 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421  
 |
 ++--+
 2 rows selected (0.02

[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID


[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151759#comment-14151759
 ] 

Damien Carol commented on HIVE-8231:


This bug is still here even with HIVE-8203 committed.

 Error when insert into empty table with ACID
 

 Key: HIVE-8231
 URL: https://issues.apache.org/jira/browse/HIVE-8231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
 Fix For: 0.14.0


 Steps to show the bug :
 1. create table 
 {code}
 create table encaissement_1b_64m like encaissement_1b;
 {code}
 2. check table 
 {code}
 desc encaissement_1b_64m;
 dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
 {code}
 everything is ok:
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m;
   
 +++--+--+
 |  col_name  | data_type  | comment  |
 +++--+--+
 | id | int|  |
 | idmagasin  | int|  |
 | zibzin | string |  |
 | cheque | int|  |
 | montant| double |  |
 | date   | timestamp  |  |
 | col_6  | string |  |
 | col_7  | string |  |
 | col_8  | string |  |
 +++--+--+
 9 rows selected (0.158 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS Output  |
 +-+--+
 +-+--+
 No rows selected (0.01 seconds)
 {noformat}
 3. Insert values into the new table
 {noformat}
 insert into table encaissement_1b_64m VALUES (1, 1, 
 '8909', 1, 12.5, '12/05/2014', '','','');
 {noformat}
 4. Check
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m;
 +-+--+
 | id  |
 +-+--+
 +-+--+
 No rows selected (0.091 seconds)
 {noformat}
 There are already a pb. I don't see the inserted row.
 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS 
 Output  |
 +-+--+
 | Found 1 items   
 |
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
   |
 +-+--+
 2 rows selected (0.014 seconds)
 {noformat}
 6. Doing a major compaction solves the bug
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 
 'major';
 No rows affected (0.046 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 ++--+
 | DFS Output  
|
 ++--+
 | Found 1 items   
|
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:21 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/base_421  
 |
 ++--+
 2 rows selected (0.02 seconds)
 {noformat}
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8289) Exclude temp tables in compactor threads

Damien Carol created HIVE-8289:
--

 Summary: Exclude temp tables in compactor threads
 Key: HIVE-8289
 URL: https://issues.apache.org/jira/browse/HIVE-8289
 Project: Hive
  Issue Type: Improvement
Reporter: Damien Carol
Priority: Minor


Currently, compactor thread try to compact temp table.
This throws errors like this one :
{noformat}
2014-09-26 15:32:18,483 ERROR [Thread-8]: compactor.Initiator 
(Initiator.java:run(111)) - Caught exception while trying to determine if we 
should compact testsimon.values__tmp__table__11.  Marking clean to avoid 
repeated failures, java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.txn.compactor.Initiator.run(Initiator.java:88)

2014-09-26 15:32:18,484 ERROR [Thread-8]: txn.CompactionTxnHandler 
(CompactionTxnHandler.java:markCleaned(355)) - Unable to delete compaction 
record
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7685) Parquet memory manager

2014-09-29 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151792#comment-14151792
 ] 

Brock Noland commented on HIVE-7685:


Hi Dong,

Ok, thank you for the investigation. I think we can either put the parquet 
memory manager in Parquet or add API's to expose the information required to 
implement the memory manager in HIve. Either approach is fine by me, we can 
take this work up in PARQUET-108.

Brock

 Parquet memory manager
 --

 Key: HIVE-7685
 URL: https://issues.apache.org/jira/browse/HIVE-7685
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Brock Noland

 Similar to HIVE-4248, Parquet tries to write large very large row groups. 
 This causes Hive to run out of memory during dynamic partitions when a 
 reducer may have many Parquet files open at a given time.
 As such, we should implement a memory manager which ensures that we don't run 
 out of memory due to writing too many row groups within a single JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces


 [ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8182:
--
Status: Patch Available  (was: Open)

Trim the line once instead of doing it on every line.

 beeline fails when executing multiple-line queries with trailing spaces
 ---

 Key: HIVE-8182
 URL: https://issues.apache.org/jira/browse/HIVE-8182
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.12.0
Reporter: Yongzhi Chen
Assignee: Sergio Peña
 Fix For: 0.14.0

 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch


 As title indicates, when executing a multi-line query with trailing spaces, 
 beeline reports syntax error: 
 Error: Error while compiling statement: FAILED: ParseException line 1:76 
 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4)
 If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-7685) Parquet memory manager

2014-09-29 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151792#comment-14151792
 ] 

Brock Noland edited comment on HIVE-7685 at 9/29/14 3:31 PM:
-

Hi Dong,

Ok, thank you for the investigation. I think we can either put the parquet 
memory manager in Parquet or add API's to expose the information required to 
implement the memory manager in Hive. Either approach is fine by me, we can 
take this work up in PARQUET-108.

Brock


was (Author: brocknoland):
Hi Dong,

Ok, thank you for the investigation. I think we can either put the parquet 
memory manager in Parquet or add API's to expose the information required to 
implement the memory manager in HIve. Either approach is fine by me, we can 
take this work up in PARQUET-108.

Brock

 Parquet memory manager
 --

 Key: HIVE-7685
 URL: https://issues.apache.org/jira/browse/HIVE-7685
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Brock Noland

 Similar to HIVE-4248, Parquet tries to write large very large row groups. 
 This causes Hive to run out of memory during dynamic partitions when a 
 reducer may have many Parquet files open at a given time.
 As such, we should implement a memory manager which ensures that we don't run 
 out of memory due to writing too many row groups within a single JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces


 [ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8182:
--
Attachment: HIVE-8182.2.patch

 beeline fails when executing multiple-line queries with trailing spaces
 ---

 Key: HIVE-8182
 URL: https://issues.apache.org/jira/browse/HIVE-8182
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Sergio Peña
 Fix For: 0.14.0

 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch


 As title indicates, when executing a multi-line query with trailing spaces, 
 beeline reports syntax error: 
 Error: Error while compiling statement: FAILED: ParseException line 1:76 
 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4)
 If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces


 [ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8182:
--
Status: Open  (was: Patch Available)

 beeline fails when executing multiple-line queries with trailing spaces
 ---

 Key: HIVE-8182
 URL: https://issues.apache.org/jira/browse/HIVE-8182
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.12.0
Reporter: Yongzhi Chen
Assignee: Sergio Peña
 Fix For: 0.14.0

 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch


 As title indicates, when executing a multi-line query with trailing spaces, 
 beeline reports syntax error: 
 Error: Error while compiling statement: FAILED: ParseException line 1:76 
 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4)
 If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8182) beeline fails when executing multiple-line queries with trailing spaces


[ 
https://issues.apache.org/jira/browse/HIVE-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151789#comment-14151789
 ] 

Sergio Peña commented on HIVE-8182:
---

Thanks [~ychena]

I agree with having only one trimming instead of doing it every line. We can 
reduce extra work on Hive by using your suggestion. I did the test and it 
worked.

I'll upload another patch.

 beeline fails when executing multiple-line queries with trailing spaces
 ---

 Key: HIVE-8182
 URL: https://issues.apache.org/jira/browse/HIVE-8182
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.1
Reporter: Yongzhi Chen
Assignee: Sergio Peña
 Fix For: 0.14.0

 Attachments: HIVE-8181.1.patch, HIVE-8182.1.patch, HIVE-8182.2.patch


 As title indicates, when executing a multi-line query with trailing spaces, 
 beeline reports syntax error: 
 Error: Error while compiling statement: FAILED: ParseException line 1:76 
 extraneous input ';' expecting EOF near 'EOF' (state=42000,code=4)
 If put this query in one single line, beeline succeeds to execute it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8287) StorageBasedAuth in metastore does not produce useful error message


[ 
https://issues.apache.org/jira/browse/HIVE-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151844#comment-14151844
 ] 

Hive QA commented on HIVE-8287:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671758/HIVE-8287.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6364 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_partition_with_whitelist
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_rename_partition_failure2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_disallow_incompatible_type_change_on1
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_disallow_incompatible_type_change_on2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_temp_table_rename
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1037/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1037/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1037/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671758

 StorageBasedAuth in metastore does not produce useful error message
 ---

 Key: HIVE-8287
 URL: https://issues.apache.org/jira/browse/HIVE-8287
 Project: Hive
  Issue Type: Bug
  Components: Authorization, Logging
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-8287.1.patch, HIVE-8287.2.patch


 Example of error message that doesn't given enough useful information -
 {noformat}
 0: jdbc:hive2://localhost:1 alter table parttab1 drop partition 
 (p1='def');
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unknown error. Please check 
 logs. (state=08S01,code=1)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6148) Support arbitrary structs stored in HBase

2014-09-29 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151846#comment-14151846
 ] 

Swarnim Kulkarni commented on HIVE-6148:


The failed test seems flaky and unrelated to my changes here.

 Support arbitrary structs stored in HBase
 -

 Key: HIVE-6148
 URL: https://issues.apache.org/jira/browse/HIVE-6148
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.12.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-6148.1.patch.txt, HIVE-6148.2.patch.txt, 
 HIVE-6148.3.patch.txt


 We should add support to be able to query arbitrary structs stored in HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]

2014-09-29 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7627:
---
Attachment: HIVE-7627.5-spark.patch

Re-uploading the same patch to test the precommit infra.

 FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
 -

 Key: HIVE-7627
 URL: https://issues.apache.org/jira/browse/HIVE-7627
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: spark-m1
 Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch, 
 HIVE-7627.3-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.4-spark.patch, 
 HIVE-7627.5-spark.patch, HIVE-7627.5-spark.patch


 Hive table statistic failed on FSStatsPublisher mode, with the following 
 exception in Spark executor side:
 {noformat}
 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception
 java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 
 20278 for file 
 /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID 
 mismatch. Request id and saved id: 20277 , 20278 for file 
 /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at

[jira] [Commented] (HIVE-8245) Collect table read entities at same time as view read entities


[ 
https://issues.apache.org/jira/browse/HIVE-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151883#comment-14151883
 ] 

Ashutosh Chauhan commented on HIVE-8245:


Committed to 0.14 branch.

 Collect table read entities at same time as view read entities 
 ---

 Key: HIVE-8245
 URL: https://issues.apache.org/jira/browse/HIVE-8245
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Security
Affects Versions: 0.13.0, 0.14.0, 0.13.1
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8245.1.patch, HIVE-8245.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8245) Collect table read entities at same time as view read entities


 [ 
https://issues.apache.org/jira/browse/HIVE-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8245:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

 Collect table read entities at same time as view read entities 
 ---

 Key: HIVE-8245
 URL: https://issues.apache.org/jira/browse/HIVE-8245
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Security
Affects Versions: 0.13.0, 0.14.0, 0.13.1
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8245.1.patch, HIVE-8245.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8191) Update and delete on tables with non Acid output formats gives runtime error


 [ 
https://issues.apache.org/jira/browse/HIVE-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8191:
-
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch 3 checked in.  Thanks Eugene for the review.

 Update and delete on tables with non Acid output formats gives runtime error
 

 Key: HIVE-8191
 URL: https://issues.apache.org/jira/browse/HIVE-8191
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8191.2.patch, HIVE-8191.3.patch, HIVE-8191.patch


 {code}
 create table not_an_acid_table(a int, b varchar(128));
 insert into table not_an_acid_table select cint, cast(cstring1 as 
 varchar(128)) from alltypesorc where cint is not null order by cint limit 10;
 delete from not_an_acid_table where b = '0ruyd6Y50JpdGRf6HqD';
 {code}
 This generates a runtime error.  It should get a compile error instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

Alan Gates created HIVE-8290:


 Summary: With DbTxnManager configured, all ORC tables forced to be 
transactional
 Key: HIVE-8290
 URL: https://issues.apache.org/jira/browse/HIVE-8290
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0


Currently, once a user configures DbTxnManager to the be transaction manager, 
all tables that use ORC are expected to be transactional.  This means they all 
have to have buckets.  This most likely won't be what users want.

We need to add a specific mark to a table so that users can indicate it should 
be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional


[ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151896#comment-14151896
 ] 

Alan Gates commented on HIVE-8290:
--

[~vikram.dixit] I'd like to get this into 0.14 as I believe not having it is a 
big usability issue, and it will be a backwards incompatible change if we add 
it later.

 With DbTxnManager configured, all ORC tables forced to be transactional
 ---

 Key: HIVE-8290
 URL: https://issues.apache.org/jira/browse/HIVE-8290
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0


 Currently, once a user configures DbTxnManager to the be transaction manager, 
 all tables that use ORC are expected to be transactional.  This means they 
 all have to have buckets.  This most likely won't be what users want.
 We need to add a specific mark to a table so that users can indicate it 
 should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151918#comment-14151918
 ] 

Hive QA commented on HIVE-7627:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671815/HIVE-7627.5-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6509 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/173/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/173/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-173/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671815

 FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
 -

 Key: HIVE-7627
 URL: https://issues.apache.org/jira/browse/HIVE-7627
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: spark-m1
 Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch, 
 HIVE-7627.3-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.4-spark.patch, 
 HIVE-7627.5-spark.patch, HIVE-7627.5-spark.patch


 Hive table statistic failed on FSStatsPublisher mode, with the following 
 exception in Spark executor side:
 {noformat}
 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception
 java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 
 20278 for file 
 /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID 
 mismatch. Request id and saved id: 20277 , 20278 for file

[jira] [Updated] (HIVE-8114) Type resolution for udf arguments of Decimal Type results in error


 [ 
https://issues.apache.org/jira/browse/HIVE-8114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8114:
---
Fix Version/s: (was: 0.15.0)

 Type resolution for udf arguments of Decimal Type results in error
 --

 Key: HIVE-8114
 URL: https://issues.apache.org/jira/browse/HIVE-8114
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Types
Affects Versions: 0.13.0, 0.13.1
Reporter: Ashutosh Chauhan
Assignee: Jason Dere
 Fix For: 0.14.0

 Attachments: HIVE-8114.1.patch


 {code}
 select log (2, 10.5BD) from src;
 {code}
 results in exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8228) CBO: fix couple of issues with partition pruning


 [ 
https://issues.apache.org/jira/browse/HIVE-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8228:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

 CBO: fix couple of issues with partition pruning
 

 Key: HIVE-8228
 URL: https://issues.apache.org/jira/browse/HIVE-8228
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 0.14.0
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-8228.1.patch


 - Pruner doesn't handle non-deterministic UDFs correctly
 - Plan genned after CBO has a Project between TScan and Filter; which 
 prevents PartPruning from triggering in hive post CBO. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional


 [ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8290:
-
Status: Patch Available  (was: Open)

 With DbTxnManager configured, all ORC tables forced to be transactional
 ---

 Key: HIVE-8290
 URL: https://issues.apache.org/jira/browse/HIVE-8290
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8290.patch


 Currently, once a user configures DbTxnManager to the be transaction manager, 
 all tables that use ORC are expected to be transactional.  This means they 
 all have to have buckets.  This most likely won't be what users want.
 We need to add a specific mark to a table so that users can indicate it 
 should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional


 [ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8290:
-
Attachment: HIVE-8290.patch

This patch changes the SemanticAnalyzer to look for a table property 
transactional before treating a table as requiring transactions.  I also 
added a number of negative tests for things such as making sure the buckets 
aren't sorted, etc.

 With DbTxnManager configured, all ORC tables forced to be transactional
 ---

 Key: HIVE-8290
 URL: https://issues.apache.org/jira/browse/HIVE-8290
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8290.patch


 Currently, once a user configures DbTxnManager to the be transaction manager, 
 all tables that use ORC are expected to be transactional.  This means they 
 all have to have buckets.  This most likely won't be what users want.
 We need to add a specific mark to a table so that users can indicate it 
 should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8266) create function using resource statement compilation should include resource URI entity

2014-09-29 Thread Prasad Mujumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-8266:
--
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks [~brocknoland] for the review!

 create function using resource statement compilation should include 
 resource URI entity
 -

 Key: HIVE-8266
 URL: https://issues.apache.org/jira/browse/HIVE-8266
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.13.1
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.15.0

 Attachments: HIVE-8266.2.patch, HIVE-8266.3.patch


 The compiler add function name and db name as write entities for create 
 function using resource statement. We should also include the resource URI 
 path in the write entity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8223) CBO Trunk Merge: partition_wise_fileformat2 select result depends on ordering


 [ 
https://issues.apache.org/jira/browse/HIVE-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8223:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

 CBO Trunk Merge: partition_wise_fileformat2 select result depends on ordering
 -

 Key: HIVE-8223
 URL: https://issues.apache.org/jira/browse/HIVE-8223
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 0.14.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.14.0

 Attachments: HIVE-8223.01.patch, HIVE-8223.02.patch, HIVE-8223.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8199) CBO Trunk Merge: quote2 test fails due to incorrect literal translation


 [ 
https://issues.apache.org/jira/browse/HIVE-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8199:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

 CBO Trunk Merge: quote2 test fails due to incorrect literal translation
 ---

 Key: HIVE-8199
 URL: https://issues.apache.org/jira/browse/HIVE-8199
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 0.14.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.14.0

 Attachments: HIVE-8199.01.patch, HIVE-8199.02.patch, HIVE-8199.patch


 Quoting of quotes and slashes is lost in translation back from CBO to AST, it 
 seems



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator


[ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151944#comment-14151944
 ] 

Hive QA commented on HIVE-8226:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12671756/HIVE-8226.03.patch

{color:green}SUCCESS:{color} +1 6363 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1038/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1038/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1038/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12671756

 Vectorize dynamic partitioning in VectorFileSinkOperator
 

 Key: HIVE-8226
 URL: https://issues.apache.org/jira/browse/HIVE-8226
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
 HIVE-8226.03.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8111) CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO

[
https://issues.apache.org/jira/browse/HIVE-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashutosh Chauhan updated HIVE-8111:
---
Fix Version/s: (was: 0.15.0)
0.14.0

CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO

Key: HIVE-8111
URL: https://issues.apache.org/jira/browse/HIVE-8111
Project: Hive
Issue Type: Sub-task
Components: CBO
Affects Versions: 0.14.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Fix For: 0.14.0

Attachments: HIVE-8111.01.patch, HIVE-8111.02.patch,
HIVE-8111.03.patch, HIVE-8111.patch

Original test failure: looks like column type changes to different decimals
in most cases. In one case it causes the integer part to be too big to fit,
so the result becomes null it seems.
What happens is that CBO adds casts to arithmetic expressions to make them
type compatible; these casts become part of new AST, and then Hive adds casts
on top of these casts. This (the first part) also causes lots of out file
changes. It's not clear how to best fix it so far, in addition to incorrect
decimal width and sometimes nulls when width is larger than allowed in Hive.
Option one - don't add those for numeric ops - cannot be done if numeric op
is a part of compare, for which CBO needs correct types.
Option two - unwrap casts when determining type in Hive - hard or impossible
to tell apart CBO-added casts and user casts.
Option three - don't change types in Hive if CBO has run - seems hacky and
hard to ensure it's applied everywhere.
Option four - map all expressions precisely between two trees and remove
casts again after optimization, will be pretty difficult.
Option five - somehow mark those casts. Not sure about how yet.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Assignee: Prasanth J  (was: Alan Gates)

 Reading from partitioned bucketed tables has high overhead, 50% of time is 
 spent in OrcInputFormat.getReader
 

 Key: HIVE-8291
 URL: https://issues.apache.org/jira/browse/HIVE-8291
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
 Fix For: 0.14.0


 When loading into a partitioned bucketed sorted table the query fails with 
 {code}
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
  Failed to create file 
 [/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0]
  for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for 
 client [172.21.128.111], because this file is already being created by 
 [DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on 
 [172.21.128.122]
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
   at org.apache.hadoop.ipc.Client.call(Client.java:1410)
   at org.apache.hadoop.ipc.Client.call(Client.java:1363)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
   at com.sun.proxy.$Proxy15.create(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
   at com.sun.proxy.$Proxy15.create(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600)
   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465)
   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
   at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
   at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1966)
   at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1983)
   at

[jira] [Created] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader

Mostafa Mokhtar created HIVE-8291:
-

 Summary: Reading from partitioned bucketed tables has high 
overhead, 50% of time is spent in OrcInputFormat.getReader
 Key: HIVE-8291
 URL: https://issues.apache.org/jira/browse/HIVE-8291
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Alan Gates
 Fix For: 0.14.0


When loading into a partitioned bucketed sorted table the query fails with 
{code}
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 Failed to create file 
[/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0]
 for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for 
client [172.21.128.111], because this file is already being created by 
[DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on 
[172.21.128.122]
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1966)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1983)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2287)
at 
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.close(OrcRecordUpdater.java:356)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriter
{code}

DDL  
{code}
CREATE TABLE

[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + :);
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
  hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002   
58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572
   
mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
 1,984   58.046
  
hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 
  58.016
 
hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)   
1,891   55.325

hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)
1,723   50.41
   
hive.common.ValidTxnListImpl.init(String)  934 27.326
   conf.Configuration.get(String, String)   
621 18.169
 {code}

  was:
When loading into a partitioned bucketed sorted table the query fails with 
{code}
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 Failed to create file 
[/tmp/hive/mmokhtar/621d7923-90d1-4d9d-a4c6-b3bb075c7a8c/hive_2014-09-22_23-25-11_678_1598300430132235708-1/_task_tmp.-ext-1/ss_sold_date=1998-01-02/_tmp.00_3/delta_0123305_0123305/bucket_0]
 for [DFSClient_attempt_1406566393272_6085_r_000144_3_-1677753045_12] for 
client [172.21.128.111], because this file is already being created by 
[DFSClient_attempt_1406566393272_6085_r_31_3_-1506661042_12] on 
[172.21.128.122]
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2308)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2237)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2190)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:258)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1600)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465)
at

[jira] [Commented] (HIVE-8270) JDBC uber jar is missing some classes required in secure setup.


[ 
https://issues.apache.org/jira/browse/HIVE-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151962#comment-14151962
 ] 

Ashutosh Chauhan commented on HIVE-8270:


LGTM +1
As Thejas pointed out, we should clarify in doc that this is meant for remote 
HS2, not for embedded one.

 JDBC uber jar is missing some classes required in secure setup.
 ---

 Key: HIVE-8270
 URL: https://issues.apache.org/jira/browse/HIVE-8270
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-8270.1.patch


 JDBC uber jar is missing some required classes for a secure setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25497: HIVE-7627, FSStatsPublisher does fit into Spark multi-thread task mode

2014-09-29 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25497/#review54837
---

Ship it!


Ship It!

- Xuefu Zhang


On Sept. 28, 2014, 9:50 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25497/
 ---
 
 (Updated Sept. 28, 2014, 9:50 a.m.)
 
 
 Review request for hive, Brock Noland and Xuefu Zhang.
 
 
 Bugs: HIVE-7627
 https://issues.apache.org/jira/browse/HIVE-7627
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Hive table statistic failed on FSStatsPublisher mode because of missing 
 mapred.task.patition parameter.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 
 1674d4b 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
 0b8b7c9 
 
 Diff: https://reviews.apache.org/r/25497/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

[jira] [Commented] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151963#comment-14151963
 ] 

Xuefu Zhang commented on HIVE-7627:
---

+1

 FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
 -

 Key: HIVE-7627
 URL: https://issues.apache.org/jira/browse/HIVE-7627
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: spark-m1
 Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch, 
 HIVE-7627.3-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.4-spark.patch, 
 HIVE-7627.5-spark.patch, HIVE-7627.5-spark.patch


 Hive table statistic failed on FSStatsPublisher mode, with the following 
 exception in Spark executor side:
 {noformat}
 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception
 java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 
 20278 for file 
 /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID 
 mismatch. Request id and saved id: 20277 , 20278 for file 
 /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at

[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + :);
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
hive.ql.exec.tez.MapRecordSource.pushRecord()   2,981   87.215
org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
1,984   58.046
hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
Reporter)   1,983   58.016

hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)
1,891   55.325
hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
AcidInputFormat$Options)1,723   50.41
hive.common.ValidTxnListImpl.init(String) 
934 27.326
conf.Configuration.get(String, String)  621 
18.169
 {code}

Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.init()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.init(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74
org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
   org.apache.commons.lang.StringUtils.replace(String, String, String)  
97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, 
String, int)  97  2.838
 java.lang.String.indexOf(String, int)  97  2.838
java.net.URI.init(String, String, String, String, String) 
65  1.902
{code}


  was:
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + :);
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
  hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002   
58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572
   
mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()

[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7627:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to Spark branch. Thanks to Chengxiang for the contribution.

 FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
 -

 Key: HIVE-7627
 URL: https://issues.apache.org/jira/browse/HIVE-7627
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: spark-m1
 Fix For: spark-branch

 Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch, 
 HIVE-7627.3-spark.patch, HIVE-7627.4-spark.patch, HIVE-7627.4-spark.patch, 
 HIVE-7627.5-spark.patch, HIVE-7627.5-spark.patch


 Hive table statistic failed on FSStatsPublisher mode, with the following 
 exception in Spark executor side:
 {noformat}
 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception
 java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 
 20278 for file 
 /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID 
 mismatch. Request id and saved id: 20277 , 20278 for file 
 /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at

[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Assignee: Alan Gates  (was: Prasanth J)

 Reading from partitioned bucketed tables has high overhead, 50% of time is 
 spent in OrcInputFormat.getReader
 

 Key: HIVE-8291
 URL: https://issues.apache.org/jira/browse/HIVE-8291
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Alan Gates
 Fix For: 0.14.0


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 50% of the time is spent in these two lines of code in 
 OrcInputFormate.getReader()
 {code}
 String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
 Long.MAX_VALUE + :);
 ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
 {code}
 {code}
 Stack Trace   Sample CountPercentage(%)
 hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
   org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572
   
 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
  Object)  2,002   58.572
   
 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
 1,984   58.046
   hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
 Reporter)   1,983   58.016
   
 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)  
   1,891   55.325
   hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
 AcidInputFormat$Options)1,723   50.41
   hive.common.ValidTxnListImpl.init(String) 
 934 27.326
 conf.Configuration.get(String, String)621 
 18.169
  {code}
 Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
 5% the CPU in 
 {code}
  Path onepath = normalizePath(onefile);
 {code}
 And 
 15% the CPU in 
 {code}
  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
 {code}
 From the profiler 
 {code}
 Stack Trace   Sample CountPercentage(%)
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 
 28.613
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)  
 978 28.613
   org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged()   
 866 25.336
  
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   
 866 25.336
 java.net.URI.relativize(URI)  655 19.163
java.net.URI.relativize(URI, URI)  655 19.163
   java.net.URI.normalize(String)  517 15.126
   java.net.URI.needsNormalization(String) 
 372 10.884
  java.lang.String.charAt(int) 235 
 6.875
 
 java.net.URI.equal(String, String)27  0.79
 
 java.lang.StringBuilder.toString()1   0.029
 
 java.lang.StringBuilder.init()  1   0.029
 
 java.lang.StringBuilder.append(String)1   0.029
   
 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167   
   4.886
  
 org.apache.hadoop.fs.Path.init(String) 162 4.74
 
 org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
 4.74
   org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, String)  
 97  2.838
 org.apache.commons.lang.StringUtils.replace(String, String, 
 String, int)  97  2.838
java.lang.String.indexOf(String, int)  97  2.838
   java.net.URI.init(String, String, String, String, String) 
 65  1.902
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8238) [CBO] Preserve subquery alias while generating ast


 [ 
https://issues.apache.org/jira/browse/HIVE-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8238:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Non-trivial to fix. HIVE-8245 solves immediate problem of view authorization.

 [CBO] Preserve subquery alias while generating ast
 --

 Key: HIVE-8238
 URL: https://issues.apache.org/jira/browse/HIVE-8238
 Project: Hive
  Issue Type: Bug
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8238.cbo.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7776:
--
Attachment: HIVE-7776.3-spark.patch

Reattach the same patch to trigger test run.

 enable sample10.q.[Spark Branch]
 

 Key: HIVE-7776
 URL: https://issues.apache.org/jira/browse/HIVE-7776
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7776.1-spark.patch, HIVE-7776.2-spark.patch, 
 HIVE-7776.3-spark.patch, HIVE-7776.3-spark.patch


 sample10.q contain dynamic partition operation, should enable this qtest 
 after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8261) CBO : Predicate pushdown is removed by Optiq

2014-09-29 Thread Harish Butani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-8261:

Attachment: HIVE-8261.1.patch

 CBO : Predicate pushdown is removed by Optiq 
 -

 Key: HIVE-8261
 URL: https://issues.apache.org/jira/browse/HIVE-8261
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0, 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-8261.1.patch


 Plan for TPC-DS Q64 wasn't optimal upon looking at the logical plan I 
 realized that predicate pushdown is not applied on date_dim d1.
 Interestingly before optiq we have the predicate pushed :
 {code}
 HiveFilterRel(condition=[=($5, $1)])
 HiveJoinRel(condition=[=($3, $6)], joinType=[inner])
   HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], 
 _o__col3=[$1])
 HiveFilterRel(condition=[=($0, 2000)])
   HiveAggregateRel(group=[{0, 1}], agg#0=[count()], agg#1=[sum($2)])
 HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2])
   HiveJoinRel(condition=[=($1, $8)], joinType=[inner])
 HiveJoinRel(condition=[=($1, $5)], joinType=[inner])
   HiveJoinRel(condition=[=($0, $3)], joinType=[inner])
 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
 ss_wholesale_cost=[$11])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]])
 HiveProjectRel(d_date_sk=[$0], d_year=[$6])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]])
   HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 
 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), 
 between(false, $1, +(35, 1), +(35, 15)))])
 HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], 
 i_color=[$17])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]])
 HiveProjectRel(_o__col0=[$0])
   HiveAggregateRel(group=[{0}])
 HiveProjectRel($f0=[$0])
   HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], 
 joinType=[inner])
 HiveProjectRel(cs_item_sk=[$15], 
 cs_order_number=[$17])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]])
 HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]])
   HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col3=[$1])
 HiveFilterRel(condition=[=($0, +(2000, 1))])
   HiveAggregateRel(group=[{0, 1}], agg#0=[count()])
 HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2])
   HiveJoinRel(condition=[=($1, $8)], joinType=[inner])
 HiveJoinRel(condition=[=($1, $5)], joinType=[inner])
   HiveJoinRel(condition=[=($0, $3)], joinType=[inner])
 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
 ss_wholesale_cost=[$11])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]])
 HiveProjectRel(d_date_sk=[$0], d_year=[$6])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]])
   HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 
 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), 
 between(false, $1, +(35, 1), +(35, 15)))])
 HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], 
 i_color=[$17])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]])
 HiveProjectRel(_o__col0=[$0])
   HiveAggregateRel(group=[{0}])
 HiveProjectRel($f0=[$0])
   HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], 
 joinType=[inner])
 HiveProjectRel(cs_item_sk=[$15], 
 cs_order_number=[$17])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]])
 HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]])
 {code}
 While after Optiq the filter on date_dim gets pulled up the plan 
 {code}
   HiveFilterRel(condition=[=($5, $1)]): rowcount = 1.0, cumulative cost = 
 {5.50188454E8 rows, 0.0 cpu, 0.0 io}, id = 6895
 HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], 
 _o__col3=[$3], _o__col00=[$4], _o__col10=[$5], _o__col30=[$6]): rowcount = 
 1.0, cumulative cost = {5.50188454E8 rows, 0.0 cpu, 0.0

[jira] [Updated] (HIVE-8261) CBO : Predicate pushdown is removed by Optiq

2014-09-29 Thread Harish Butani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-8261:

Status: Patch Available  (was: Open)

 CBO : Predicate pushdown is removed by Optiq 
 -

 Key: HIVE-8261
 URL: https://issues.apache.org/jira/browse/HIVE-8261
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.13.1, 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-8261.1.patch


 Plan for TPC-DS Q64 wasn't optimal upon looking at the logical plan I 
 realized that predicate pushdown is not applied on date_dim d1.
 Interestingly before optiq we have the predicate pushed :
 {code}
 HiveFilterRel(condition=[=($5, $1)])
 HiveJoinRel(condition=[=($3, $6)], joinType=[inner])
   HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col2=[$3], 
 _o__col3=[$1])
 HiveFilterRel(condition=[=($0, 2000)])
   HiveAggregateRel(group=[{0, 1}], agg#0=[count()], agg#1=[sum($2)])
 HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2])
   HiveJoinRel(condition=[=($1, $8)], joinType=[inner])
 HiveJoinRel(condition=[=($1, $5)], joinType=[inner])
   HiveJoinRel(condition=[=($0, $3)], joinType=[inner])
 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
 ss_wholesale_cost=[$11])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]])
 HiveProjectRel(d_date_sk=[$0], d_year=[$6])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]])
   HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 
 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), 
 between(false, $1, +(35, 1), +(35, 15)))])
 HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], 
 i_color=[$17])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]])
 HiveProjectRel(_o__col0=[$0])
   HiveAggregateRel(group=[{0}])
 HiveProjectRel($f0=[$0])
   HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], 
 joinType=[inner])
 HiveProjectRel(cs_item_sk=[$15], 
 cs_order_number=[$17])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]])
 HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]])
   HiveProjectRel(_o__col0=[$0], _o__col1=[$2], _o__col3=[$1])
 HiveFilterRel(condition=[=($0, +(2000, 1))])
   HiveAggregateRel(group=[{0, 1}], agg#0=[count()])
 HiveProjectRel($f0=[$4], $f1=[$5], $f2=[$2])
   HiveJoinRel(condition=[=($1, $8)], joinType=[inner])
 HiveJoinRel(condition=[=($1, $5)], joinType=[inner])
   HiveJoinRel(condition=[=($0, $3)], joinType=[inner])
 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
 ss_wholesale_cost=[$11])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.store_sales]])
 HiveProjectRel(d_date_sk=[$0], d_year=[$6])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.date_dim]])
   HiveFilterRel(condition=[AND(in($2, 'maroon', 'burnished', 
 'dim', 'steel', 'navajo', 'chocolate'), between(false, $1, 35, +(35, 10)), 
 between(false, $1, +(35, 1), +(35, 15)))])
 HiveProjectRel(i_item_sk=[$0], i_current_price=[$5], 
 i_color=[$17])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.item]])
 HiveProjectRel(_o__col0=[$0])
   HiveAggregateRel(group=[{0}])
 HiveProjectRel($f0=[$0])
   HiveJoinRel(condition=[AND(=($0, $2), =($1, $3))], 
 joinType=[inner])
 HiveProjectRel(cs_item_sk=[$15], 
 cs_order_number=[$17])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_sales]])
 HiveProjectRel(cr_item_sk=[$2], cr_order_number=[$16])
   
 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200.catalog_returns]])
 {code}
 While after Optiq the filter on date_dim gets pulled up the plan 
 {code}
   HiveFilterRel(condition=[=($5, $1)]): rowcount = 1.0, cumulative cost = 
 {5.50188454E8 rows, 0.0 cpu, 0.0 io}, id = 6895
 HiveProjectRel(_o__col0=[$0], _o__col1=[$1], _o__col2=[$2], 
 _o__col3=[$3], _o__col00=[$4], _o__col10=[$5], _o__col30=[$6]): rowcount = 
 1.0, cumulative cost = {5.50188454E8 rows, 0.0 cpu,

[jira] [Updated] (HIVE-7971) Support alter table change/replace/add columns for existing partitions

2014-09-29 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7971:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and 0.14 branch.

 Support alter table change/replace/add columns for existing partitions
 --

 Key: HIVE-7971
 URL: https://issues.apache.org/jira/browse/HIVE-7971
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.14.0

 Attachments: HIVE-7971.1.patch, HIVE-7971.2.patch, HIVE-7971.3.patch


 ALTER TABLE CHANGE COLUMN is allowed for tables, but not for partitions. Same 
 for add/replace columns.
 Allowing this for partitions can be useful in some cases. For example, one 
 user has tables with Hive 0.12 Decimal columns, which do not specify 
 precision/scale. To be able to properly read the decimal values from the 
 existing partitions, the column types in the partitions need to be changed to 
 decimal types with precision/scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8180) Update SparkReduceRecordHandler for processing the vectors [spark branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152002#comment-14152002
 ] 

Xuefu Zhang commented on HIVE-8180:
---

Hi [~chinnalalam], the patch looks very good. I just had a very miner comment 
on RB. Thanks.

 Update SparkReduceRecordHandler for processing the vectors [spark branch]
 -

 Key: HIVE-8180
 URL: https://issues.apache.org/jira/browse/HIVE-8180
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
  Labels: Spark-M1
 Attachments: HIVE-8180-spark.patch, HIVE-8180.1-spark.patch, 
 HIVE-8180.2-spark.patch


 Update SparkReduceRecordHandler for processing the vectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8290) With DbTxnManager configured, all ORC tables forced to be transactional

2014-09-29 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152008#comment-14152008
 ] 

Vikram Dixit K commented on HIVE-8290:
--

+1 for 0.14.

 With DbTxnManager configured, all ORC tables forced to be transactional
 ---

 Key: HIVE-8290
 URL: https://issues.apache.org/jira/browse/HIVE-8290
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8290.patch


 Currently, once a user configures DbTxnManager to the be transaction manager, 
 all tables that use ORC are expected to be transactional.  This means they 
 all have to have buckets.  This most likely won't be what users want.
 We need to add a specific mark to a table so that users can indicate it 
 should be treated in a transactional way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8231) Error when insert into empty table with ACID


[ 
https://issues.apache.org/jira/browse/HIVE-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152007#comment-14152007
 ] 

Alan Gates commented on HIVE-8231:
--

I think I can reproduce the same bug with 2 command line sessions doing things 
in the following order:

# Start session 1
# in session 1 insert into table
# start session 2
# in session 2 select * see all rows
# in session 1, delete some rows
# in session 1 selec *, see less rows
# in session 2 select * , see all rows

If I stop and restart session 2 after this, than it sees the appropriate number 
of rows.  So either it isn't getting new transaction information for each query 
in the session, or the results are being cached somewhere on it.

Does this match the behavior you're seeing?

 Error when insert into empty table with ACID
 

 Key: HIVE-8231
 URL: https://issues.apache.org/jira/browse/HIVE-8231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
 Fix For: 0.14.0


 Steps to show the bug :
 1. create table 
 {code}
 create table encaissement_1b_64m like encaissement_1b;
 {code}
 2. check table 
 {code}
 desc encaissement_1b_64m;
 dfs -ls hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m;
 {code}
 everything is ok:
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino desc encaissement_1b_64m;
   
 +++--+--+
 |  col_name  | data_type  | comment  |
 +++--+--+
 | id | int|  |
 | idmagasin  | int|  |
 | zibzin | string |  |
 | cheque | int|  |
 | montant| double |  |
 | date   | timestamp  |  |
 | col_6  | string |  |
 | col_7  | string |  |
 | col_8  | string |  |
 +++--+--+
 9 rows selected (0.158 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS Output  |
 +-+--+
 +-+--+
 No rows selected (0.01 seconds)
 {noformat}
 3. Insert values into the new table
 {noformat}
 insert into table encaissement_1b_64m VALUES (1, 1, 
 '8909', 1, 12.5, '12/05/2014', '','','');
 {noformat}
 4. Check
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino select id from encaissement_1b_64m;
 +-+--+
 | id  |
 +-+--+
 +-+--+
 No rows selected (0.091 seconds)
 {noformat}
 There are already a pb. I don't see the inserted row.
 5. When I'm checking HDFS directory, I see {{delta_421_421}} folder
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 +-+--+
 | DFS 
 Output  |
 +-+--+
 | Found 1 items   
 |
 | drwxr-xr-x   - hduser supergroup  0 2014-09-23 12:17 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/delta_421_421
   |
 +-+--+
 2 rows selected (0.014 seconds)
 {noformat}
 6. Doing a major compaction solves the bug
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino alter table encaissement_1b_64m compact 
 'major';
 No rows affected (0.046 seconds)
 0: jdbc:hive2://nc-h04:1/casino dfs -ls 
 hdfs://nc-h04/user/hive/warehouse/casino.db/encaissement_1b_64m/;
 ++--+
 | DFS Output  
|
 ++--+
 | Found 1 items   
|
 | drwxr-xr-x   - hduser supergroup  0

[jira] [Commented] (HIVE-7843) orc_analyze.q fails due to random mapred.task.id in FileSinkOperator [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152012#comment-14152012
 ] 

Xuefu Zhang commented on HIVE-7843:
---

Hi [~vkorukanti], would you like to reload the patch to trigger the test run? 
The build VM were killed in the weekend.

 orc_analyze.q fails due to random mapred.task.id in FileSinkOperator [Spark 
 Branch]
 ---

 Key: HIVE-7843
 URL: https://issues.apache.org/jira/browse/HIVE-7843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
  Labels: Spark-M1
 Fix For: spark-branch

 Attachments: HIVE-7843.1-spark.patch


 {code}
 java.lang.AssertionError: data length is different from num of DP columns
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:809)
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:730)
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829)
 org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:502)
 org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:525)
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:198)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27)
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator


[ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152015#comment-14152015
 ] 

Prasanth J commented on HIVE-8226:
--

Committed patch to trunk. I will wait for [~vikram.dixit] to weigh this for 
branch-0.14 commit.

 Vectorize dynamic partitioning in VectorFileSinkOperator
 

 Key: HIVE-8226
 URL: https://issues.apache.org/jira/browse/HIVE-8226
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
 HIVE-8226.03.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator

2014-09-29 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152019#comment-14152019
 ] 

Vikram Dixit K commented on HIVE-8226:
--

+1 for 0.14

 Vectorize dynamic partitioning in VectorFileSinkOperator
 

 Key: HIVE-8226
 URL: https://issues.apache.org/jira/browse/HIVE-8226
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
 HIVE-8226.03.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8291) Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Assignee: Owen O'Malley  (was: Alan Gates)

 Reading from partitioned bucketed tables has high overhead, 50% of time is 
 spent in OrcInputFormat.getReader
 

 Key: HIVE-8291
 URL: https://issues.apache.org/jira/browse/HIVE-8291
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Owen O'Malley
 Fix For: 0.14.0


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 50% of the time is spent in these two lines of code in 
 OrcInputFormate.getReader()
 {code}
 String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
 Long.MAX_VALUE + :);
 ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
 {code}
 {code}
 Stack Trace   Sample CountPercentage(%)
 hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
   org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572
   
 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
  Object)  2,002   58.572
   
 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
 1,984   58.046
   hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
 Reporter)   1,983   58.016
   
 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)  
   1,891   55.325
   hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
 AcidInputFormat$Options)1,723   50.41
   hive.common.ValidTxnListImpl.init(String) 
 934 27.326
 conf.Configuration.get(String, String)621 
 18.169
  {code}
 Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
 5% the CPU in 
 {code}
  Path onepath = normalizePath(onefile);
 {code}
 And 
 15% the CPU in 
 {code}
  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
 {code}
 From the profiler 
 {code}
 Stack Trace   Sample CountPercentage(%)
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 
 28.613
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)  
 978 28.613
   org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged()   
 866 25.336
  
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   
 866 25.336
 java.net.URI.relativize(URI)  655 19.163
java.net.URI.relativize(URI, URI)  655 19.163
   java.net.URI.normalize(String)  517 15.126
   java.net.URI.needsNormalization(String) 
 372 10.884
  java.lang.String.charAt(int) 235 
 6.875
 
 java.net.URI.equal(String, String)27  0.79
 
 java.lang.StringBuilder.toString()1   0.029
 
 java.lang.StringBuilder.init()  1   0.029
 
 java.lang.StringBuilder.append(String)1   0.029
   
 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167   
   4.886
  
 org.apache.hadoop.fs.Path.init(String) 162 4.74
 
 org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
 4.74
   org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, String)  
 97  2.838
 org.apache.commons.lang.StringUtils.replace(String, String, 
 String, int)  97  2.838
java.lang.String.indexOf(String, int)  97  2.838
   java.net.URI.init(String, String, String, String, String) 
 65  1.902
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8226) Vectorize dynamic partitioning in VectorFileSinkOperator


 [ 
https://issues.apache.org/jira/browse/HIVE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8226:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch-0.14 as well. Thanks [~mmccline] and [~vikram.dixit]!

 Vectorize dynamic partitioning in VectorFileSinkOperator
 

 Key: HIVE-8226
 URL: https://issues.apache.org/jira/browse/HIVE-8226
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8226.01.patch, HIVE-8226.02.patch, 
 HIVE-8226.03.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8196) Joining on partition columns with fetch column stats enabled results it very small CE which negatively affects query performance


 [ 
https://issues.apache.org/jira/browse/HIVE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8196:
-
Attachment: HIVE-8196.5.patch

Not sure why parallel.q is adding and removing POSTHOOK between test runs. 
Anyways trying again to see if the passes this time.

 Joining on partition columns with fetch column stats enabled results it very 
 small CE which negatively affects query performance 
 -

 Key: HIVE-8196
 URL: https://issues.apache.org/jira/browse/HIVE-8196
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
Priority: Blocker
  Labels: performance
 Fix For: 0.14.0

 Attachments: HIVE-8196.1.patch, HIVE-8196.2.patch, HIVE-8196.3.patch, 
 HIVE-8196.4.patch, HIVE-8196.5.patch


 To make the best out of dynamic partition pruning joins should be on the 
 partitioning columns which results in dynamically pruning the partitions from 
 the fact table based on the qualifying column keys from the dimension table, 
 this type of joins negatively effects on cardinality estimates with fetch 
 column stats enabled.
 Currently we don't have statistics for partition columns and as a result NDV 
 is set to row count, doing that negatively affects the estimated join 
 selectivity from the join.
 Workaround is to capture statistics for partition columns or use number of 
 partitions incase dynamic partitioning is used.
 In StatsUtils.getColStatisticsFromExpression is where count distincts gets 
 set to row count 
 {code}
   if (encd.getIsPartitionColOrVirtualCol()) {
 // vitual columns
 colType = encd.getTypeInfo().getTypeName();
 countDistincts = numRows;
 oi = encd.getWritableObjectInspector();
 {code}
 Query used to repro the issue :
 {code}
 set hive.stats.fetch.column.stats=true;
 set hive.tez.dynamic.partition.pruning=true;
 explain select d_date 
 from store_sales, date_dim 
 where 
 store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
 date_dim.d_year = 1998;
 {code}
 Plan 
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 2 (BROADCAST_EDGE)
   DagName: mmokhtar_20140919180404_945d29f5-d041-4420-9666-1c5d64fa6540:8
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_sold_date_sk is not null (type: boolean)
   Statistics: Num rows: 550076554 Data size: 47370018816 
 Basic stats: COMPLETE Column stats: COMPLETE
   Map Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {ss_sold_date_sk}
   1 {d_date_sk} {d_date}
 keys:
   0 ss_sold_date_sk (type: int)
   1 d_date_sk (type: int)
 outputColumnNames: _col22, _col26, _col28
 input vertices:
   1 Map 2
 Statistics: Num rows: 652 Data size: 66504 Basic stats: 
 COMPLETE Column stats: COMPLETE
 Filter Operator
   predicate: (_col22 = _col26) (type: boolean)
   Statistics: Num rows: 326 Data size: 33252 Basic stats: 
 COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: _col28 (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 326 Data size: 30644 Basic 
 stats: COMPLETE Column stats: COMPLETE
 File Output Operator
   compressed: false
   Statistics: Num rows: 326 Data size: 30644 Basic 
 stats: COMPLETE Column stats: COMPLETE
   table:
   input format: 
 org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   serde: 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 Execution mode: vectorized
 Map 2
 Map Operator Tree:
 TableScan
   alias: date_dim
   filterExpr: (d_date_sk is not null and (d_year = 1998)) 
 (type: boolean)
   Statistics: Num rows: 73049 Data size: 81741831 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator

[jira] [Updated] (HIVE-8291) ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Summary: ACID : Reading from partitioned bucketed tables has high overhead, 
50% of time is spent in OrcInputFormat.getReader  (was: Reading from 
partitioned bucketed tables has high overhead, 50% of time is spent in 
OrcInputFormat.getReader)

 ACID : Reading from partitioned bucketed tables has high overhead, 50% of 
 time is spent in OrcInputFormat.getReader
 ---

 Key: HIVE-8291
 URL: https://issues.apache.org/jira/browse/HIVE-8291
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Owen O'Malley
 Fix For: 0.14.0


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 50% of the time is spent in these two lines of code in 
 OrcInputFormate.getReader()
 {code}
 String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
 Long.MAX_VALUE + :);
 ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
 {code}
 {code}
 Stack Trace   Sample CountPercentage(%)
 hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
   org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572
   
 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
  Object)  2,002   58.572
   
 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
 1,984   58.046
   hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
 Reporter)   1,983   58.016
   
 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)  
   1,891   55.325
   hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
 AcidInputFormat$Options)1,723   50.41
   hive.common.ValidTxnListImpl.init(String) 
 934 27.326
 conf.Configuration.get(String, String)621 
 18.169
  {code}
 Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
 5% the CPU in 
 {code}
  Path onepath = normalizePath(onefile);
 {code}
 And 
 15% the CPU in 
 {code}
  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
 {code}
 From the profiler 
 {code}
 Stack Trace   Sample CountPercentage(%)
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 
 28.613
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)  
 978 28.613
   org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged()   
 866 25.336
  
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   
 866 25.336
 java.net.URI.relativize(URI)  655 19.163
java.net.URI.relativize(URI, URI)  655 19.163
   java.net.URI.normalize(String)  517 15.126
   java.net.URI.needsNormalization(String) 
 372 10.884
  java.lang.String.charAt(int) 235 
 6.875
 
 java.net.URI.equal(String, String)27  0.79
 
 java.lang.StringBuilder.toString()1   0.029
 
 java.lang.StringBuilder.init()  1   0.029
 
 java.lang.StringBuilder.append(String)1   0.029
   
 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167   
   4.886
  
 org.apache.hadoop.fs.Path.init(String) 162 4.74
 
 org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
 4.74
   org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, String)  
 97  2.838
 org.apache.commons.lang.StringUtils.replace(String, String, 
 String, int)  97  2.838
java.lang.String.indexOf(String, int)  97  2.838
   java.net.URI.init(String, String, String, String, String) 
 65  1.902
 {code}



--
This message was sent by Atlassian

[jira] [Created] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

Mostafa Mokhtar created HIVE-8292:
-

 Summary: Reading from partitioned bucketed tables has high 
overhead in MapOperator.cleanUpInputFileChangedOp
 Key: HIVE-8292
 URL: https://issues.apache.org/jira/browse/HIVE-8292
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Owen O'Malley
 Fix For: 0.14.0


Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + :);
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
hive.ql.exec.tez.MapRecordSource.pushRecord()   2,981   87.215
org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
1,984   58.046
hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
Reporter)   1,983   58.016

hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)
1,891   55.325
hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
AcidInputFormat$Options)1,723   50.41
hive.common.ValidTxnListImpl.init(String) 
934 27.326
conf.Configuration.get(String, String)  621 
18.169
 {code}

Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.init()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.init(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74
org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
   org.apache.commons.lang.StringUtils.replace(String, String, String)  
97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, 
String, int)  97  2.838
 java.lang.String.indexOf(String, int)  97  2.838
java.net.URI.init(String, String, String, String, String) 
65  1.902
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp


 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Assignee: Prasanth J  (was: Owen O'Malley)

 Reading from partitioned bucketed tables has high overhead in 
 MapOperator.cleanUpInputFileChangedOp
 ---

 Key: HIVE-8292
 URL: https://issues.apache.org/jira/browse/HIVE-8292
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
 Fix For: 0.14.0


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 50% of the time is spent in these two lines of code in 
 OrcInputFormate.getReader()
 {code}
 String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
 Long.MAX_VALUE + :);
 ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
 {code}
 {code}
 Stack Trace   Sample CountPercentage(%)
 hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
   org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572
   
 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
  Object)  2,002   58.572
   
 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
 1,984   58.046
   hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
 Reporter)   1,983   58.016
   
 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)  
   1,891   55.325
   hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
 AcidInputFormat$Options)1,723   50.41
   hive.common.ValidTxnListImpl.init(String) 
 934 27.326
 conf.Configuration.get(String, String)621 
 18.169
  {code}
 Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
 5% the CPU in 
 {code}
  Path onepath = normalizePath(onefile);
 {code}
 And 
 15% the CPU in 
 {code}
  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
 {code}
 From the profiler 
 {code}
 Stack Trace   Sample CountPercentage(%)
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 
 28.613
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)  
 978 28.613
   org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged()   
 866 25.336
  
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   
 866 25.336
 java.net.URI.relativize(URI)  655 19.163
java.net.URI.relativize(URI, URI)  655 19.163
   java.net.URI.normalize(String)  517 15.126
   java.net.URI.needsNormalization(String) 
 372 10.884
  java.lang.String.charAt(int) 235 
 6.875
 
 java.net.URI.equal(String, String)27  0.79
 
 java.lang.StringBuilder.toString()1   0.029
 
 java.lang.StringBuilder.init()  1   0.029
 
 java.lang.StringBuilder.append(String)1   0.029
   
 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167   
   4.886
  
 org.apache.hadoop.fs.Path.init(String) 162 4.74
 
 org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
 4.74
   org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, String)  
 97  2.838
 org.apache.commons.lang.StringUtils.replace(String, String, 
 String, int)  97  2.838
java.lang.String.indexOf(String, int)  97  2.838
   java.net.URI.init(String, String, String, String, String) 
 65  1.902
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp


 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.init()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.init(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74
org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
   org.apache.commons.lang.StringUtils.replace(String, String, String)  
97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, 
String, int)  97  2.838
 java.lang.String.indexOf(String, int)  97  2.838
java.net.URI.init(String, String, String, String, String) 
65  1.902
{code}


  was:
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + :);
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
hive.ql.exec.tez.MapRecordSource.pushRecord()   2,981   87.215
org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
1,984   58.046
hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
Reporter)   1,983   58.016

hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)
1,891   55.325
hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
AcidInputFormat$Options)1,723   50.41
hive.common.ValidTxnListImpl.init(String) 
934 27.326
conf.Configuration.get(String, String)  621 
18.169
 {code}

Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163

[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp


 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Attachment: 2014_09_29_14_46_04.jfr

Hot function profile

 Reading from partitioned bucketed tables has high overhead in 
 MapOperator.cleanUpInputFileChangedOp
 ---

 Key: HIVE-8292
 URL: https://issues.apache.org/jira/browse/HIVE-8292
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: 2014_09_29_14_46_04.jfr


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
 5% the CPU in 
 {code}
  Path onepath = normalizePath(onefile);
 {code}
 And 
 15% the CPU in 
 {code}
  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
 {code}
 From the profiler 
 {code}
 Stack Trace   Sample CountPercentage(%)
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 
 28.613
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)  
 978 28.613
   org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged()   
 866 25.336
  
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   
 866 25.336
 java.net.URI.relativize(URI)  655 19.163
java.net.URI.relativize(URI, URI)  655 19.163
   java.net.URI.normalize(String)  517 15.126
   java.net.URI.needsNormalization(String) 
 372 10.884
  java.lang.String.charAt(int) 235 
 6.875
 
 java.net.URI.equal(String, String)27  0.79
 
 java.lang.StringBuilder.toString()1   0.029
 
 java.lang.StringBuilder.init()  1   0.029
 
 java.lang.StringBuilder.append(String)1   0.029
   
 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167   
   4.886
  
 org.apache.hadoop.fs.Path.init(String) 162 4.74
 
 org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
 4.74
   org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, String)  
 97  2.838
 org.apache.commons.lang.StringUtils.replace(String, String, 
 String, int)  97  2.838
java.lang.String.indexOf(String, int)  97  2.838
   java.net.URI.init(String, String, String, String, String) 
 65  1.902
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8291) ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader


 [ 
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8291:
--
Attachment: 2014_09_28_16_48_48.jfr

Hot function profile.
Use Java mission control (jmc) to open the file, JMC is part of Java 7.

 ACID : Reading from partitioned bucketed tables has high overhead, 50% of 
 time is spent in OrcInputFormat.getReader
 ---

 Key: HIVE-8291
 URL: https://issues.apache.org/jira/browse/HIVE-8291
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Owen O'Malley
 Fix For: 0.14.0

 Attachments: 2014_09_28_16_48_48.jfr


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 50% of the time is spent in these two lines of code in 
 OrcInputFormate.getReader()
 {code}
 String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
 Long.MAX_VALUE + :);
 ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
 {code}
 {code}
 Stack Trace   Sample CountPercentage(%)
 hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
   org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572
   
 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
  Object)  2,002   58.572
   
 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
 1,984   58.046
   hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
 Reporter)   1,983   58.016
   
 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)  
   1,891   55.325
   hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
 AcidInputFormat$Options)1,723   50.41
   hive.common.ValidTxnListImpl.init(String) 
 934 27.326
 conf.Configuration.get(String, String)621 
 18.169
  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8291) ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader