date:20140619


[ 
https://issues.apache.org/jira/browse/HIVE-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037074#comment-14037074
 ] 

Hive QA commented on HIVE-6694:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650965/HIVE-6694.4.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5655 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/511/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/511/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-511/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650965

 Beeline should provide a way to execute shell command as Hive CLI does
 --

 Key: HIVE-6694
 URL: https://issues.apache.org/jira/browse/HIVE-6694
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.11.0, 0.12.0, 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.14.0

 Attachments: HIVE-6694.1.patch, HIVE-6694.1.patch, HIVE-6694.2.patch, 
 HIVE-6694.3.patch, HIVE-6694.4.patch, HIVE-6694.patch


 Hive CLI allows a user to execute a shell command using ! notation. For 
 instance, !cat myfile.txt. Being able to execute shell command may be 
 important for some users. As a replacement, however, Beeline provides no such 
 capability, possibly because ! notation is reserved for SQLLine commands. 
 It's possible to provide this using a slightly syntactic variation such as 
 !sh cat myfilie.txt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization

2014-06-19 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7205:


Attachment: HIVE-7205.1.patch.txt

 Wrong results when union all of grouping followed by group by with 
 correlation optimization
 ---

 Key: HIVE-7205
 URL: https://issues.apache.org/jira/browse/HIVE-7205
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: dima machlin
Priority: Critical
 Attachments: HIVE-7205.1.patch.txt


 use case :
 table TBL (a string,b string) contains single row : 'a','a'
 the following query :
 {code:sql}
 select b, sum(cc) from (
 select b,count(1) as cc from TBL group by b
 union all
 select a as b,count(1) as cc from TBL group by a
 ) z
 group by b
 {code}
 returns 
 a 1
 a 1
 while set hive.optimize.correlation=true;
 if we change set hive.optimize.correlation=false;
 it returns correct results : a 2
 The plan with correlation optimization :
 {code:sql}
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM 
 (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
 TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR 
 (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY 
 (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION 
 (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) 
 (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL 
 a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
 (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum 
 (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 null-subquery1:z-subquery1:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: b
 type: string
   outputColumnNames: b
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: b
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 0
   value expressions:
 expr: _col1
 type: bigint
 null-subquery2:z-subquery2:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: a
 type: string
   outputColumnNames: a
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: a
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 1
   value expressions:
 expr: _col1
 type: bigint
   Reduce Operator Tree:
 Demux Operator
   Group By Operator
 aggregations:
   expr: count(VALUE._col0)
 bucketGroup: false
 keys:
   expr: KEY._col0
   type: string
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Select Operator
   expressions:
 expr: _col0
 type: string
 expr: _col1
 type: bigint
   outputColumnNames: _col0, _col1
   Union
 Select Operator
   expressions:
 expr: _col0
 type: string
 expr: _col1
 type: bigint
   outputColumnNames: _col0, _col1
   Mux

[jira] [Commented] (HIVE-7232) VectorReduceSink is emitting incorrect JOIN keys


[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037176#comment-14037176
 ] 

Hive QA commented on HIVE-7232:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650983/HIVE-7232.1.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5654 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/513/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/513/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-513/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650983

 VectorReduceSink is emitting incorrect JOIN keys
 

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-7232-extra-logging.patch, HIVE-7232.1.patch.txt, 
 q5.explain.txt, q5.sql


 After HIVE-7121, tpc-h query5 has resulted in incorrect results.
 Thanks to [~navis], it has been tracked down to the auto-parallel settings 
 which were initialized for ReduceSinkOperator, but not for 
 VectorReduceSinkOperator. The vector version inherits, but doesn't call 
 super.initializeOp() or set up the variable correctly from ReduceSinkDesc.
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization

2014-06-19 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7205:


 Assignee: Navis
Affects Version/s: 0.13.0
   0.13.1
   Status: Patch Available  (was: Open)

Running preliminary test

 Wrong results when union all of grouping followed by group by with 
 correlation optimization
 ---

 Key: HIVE-7205
 URL: https://issues.apache.org/jira/browse/HIVE-7205
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.13.0, 0.12.0
Reporter: dima machlin
Assignee: Navis
Priority: Critical
 Attachments: HIVE-7205.1.patch.txt


 use case :
 table TBL (a string,b string) contains single row : 'a','a'
 the following query :
 {code:sql}
 select b, sum(cc) from (
 select b,count(1) as cc from TBL group by b
 union all
 select a as b,count(1) as cc from TBL group by a
 ) z
 group by b
 {code}
 returns 
 a 1
 a 1
 while set hive.optimize.correlation=true;
 if we change set hive.optimize.correlation=false;
 it returns correct results : a 2
 The plan with correlation optimization :
 {code:sql}
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM 
 (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
 TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR 
 (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY 
 (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION 
 (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) 
 (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL 
 a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
 (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum 
 (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 null-subquery1:z-subquery1:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: b
 type: string
   outputColumnNames: b
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: b
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 0
   value expressions:
 expr: _col1
 type: bigint
 null-subquery2:z-subquery2:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: a
 type: string
   outputColumnNames: a
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: a
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 1
   value expressions:
 expr: _col1
 type: bigint
   Reduce Operator Tree:
 Demux Operator
   Group By Operator
 aggregations:
   expr: count(VALUE._col0)
 bucketGroup: false
 keys:
   expr: KEY._col0
   type: string
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Select Operator
   expressions:
 expr: _col0
 type: string
 expr: _col1
 type: bigint
   outputColumnNames: _col0, _col1
   Union
 Select Operator
   expressions:
 expr: _col0

[jira] [Commented] (HIVE-4605) Hive job fails while closing reducer output - Unable to rename

2014-06-19 Thread George Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037200#comment-14037200
 ] 

George Wong commented on HIVE-4605:
---

We ran into this issue in our cluster.
The error message is like this
{code}
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: 
hdfs://***:8020/tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_task_tmp.-ext-10002/_tmp.00_3
 to: 
hdfs://***:8020/tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_tmp.-ext-10002/00_3
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:197)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:108)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:867)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
at 
org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:309)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:470)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:407)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
{code}

The log of NameNode shows
{code}
2014-06-16 20:43:38,582 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
FSDirectory.unprotectedRenameTo: failed to rename 
/tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_task_tmp.-ext-10002/_tmp.00_3
 to 
/tmp/hive-svcckppi/hive_2014-06-16_20-24-09_584_6615934756634587679/_tmp.-ext-10002/00_3
 because destination's parent does not exist
{code}

 Hive job fails while closing reducer output - Unable to rename
 --

 Key: HIVE-4605
 URL: https://issues.apache.org/jira/browse/HIVE-4605
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
 Environment: OS: 2.6.18-194.el5xen #1 SMP Fri Apr 2 15:34:40 EDT 2010 
 x86_64 x86_64 x86_64 GNU/Linux
 Hadoop 1.1.2
Reporter: Link Qian
Assignee: Brock Noland

 1, create a table with ORC storage model
 create table iparea_analysis_orc (network int, ip string,   )
 stored as ORC;
 2, insert table iparea_analysis_orc select  network, ip,  , the script 
 success, but failed after add *OVERWRITE* keyword.  the main error log list 
 as here.
 ava.lang.RuntimeException: Hive Runtime Error while closing operators: Unable 
 to rename output from: 
 hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_task_tmp.-ext-1/_tmp.00_0
  to: 
 hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_tmp.-ext-1/00_0
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:317)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:530)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
 output from: 
 hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_task_tmp.-ext-1/_tmp.00_0
  to: 
 hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_tmp.-ext-1/00_0
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:197)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:108)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:867)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
   at

[jira] [Commented] (HIVE-4605) Hive job fails while closing reducer output - Unable to rename

2014-06-19 Thread George Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037203#comment-14037203
 ] 

George Wong commented on HIVE-4605:
---

I went through the code of FileSink operator
The code is like this.
{code}
 if ((bDynParts || isSkewedStoredAsSubDirectories)
   !fs.exists(finalPaths[idx].getParent())) {
fs.mkdirs(finalPaths[idx].getParent());
  }
{code}

I am wondering why we should check bDynParts and 
isSkewedStoredAsSubDirectories. In the code, the output is move to finalPath no 
matter what the values of bDynParts and isSkewedStoredAsSubDirectories are. 
Since the date move is not avoidable, why not change the code to the following 
to make sure the path exists before moving the file.
{code}
 if (!fs.exists(finalPaths[idx].getParent())) {
fs.mkdirs(finalPaths[idx].getParent());
  }
{code}

 Hive job fails while closing reducer output - Unable to rename
 --

 Key: HIVE-4605
 URL: https://issues.apache.org/jira/browse/HIVE-4605
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
 Environment: OS: 2.6.18-194.el5xen #1 SMP Fri Apr 2 15:34:40 EDT 2010 
 x86_64 x86_64 x86_64 GNU/Linux
 Hadoop 1.1.2
Reporter: Link Qian
Assignee: Brock Noland

 1, create a table with ORC storage model
 create table iparea_analysis_orc (network int, ip string,   )
 stored as ORC;
 2, insert table iparea_analysis_orc select  network, ip,  , the script 
 success, but failed after add *OVERWRITE* keyword.  the main error log list 
 as here.
 ava.lang.RuntimeException: Hive Runtime Error while closing operators: Unable 
 to rename output from: 
 hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_task_tmp.-ext-1/_tmp.00_0
  to: 
 hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_tmp.-ext-1/00_0
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:317)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:530)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
 output from: 
 hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_task_tmp.-ext-1/_tmp.00_0
  to: 
 hdfs://qa3hop001.uucun.com:9000/tmp/hive-hadoop/hive_2013-05-24_15-11-06_511_7746839019590922068/_tmp.-ext-1/00_0
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:197)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:108)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:867)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:309)
   ... 7 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6694) Beeline should provide a way to execute shell command as Hive CLI does

2014-06-19 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037328#comment-14037328
 ] 

Xuefu Zhang commented on HIVE-6694:
---

[~brocknoland] Would you mind taking another look at the patch? Thanks.

 Beeline should provide a way to execute shell command as Hive CLI does
 --

 Key: HIVE-6694
 URL: https://issues.apache.org/jira/browse/HIVE-6694
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.11.0, 0.12.0, 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.14.0

 Attachments: HIVE-6694.1.patch, HIVE-6694.1.patch, HIVE-6694.2.patch, 
 HIVE-6694.3.patch, HIVE-6694.4.patch, HIVE-6694.patch


 Hive CLI allows a user to execute a shell command using ! notation. For 
 instance, !cat myfile.txt. Being able to execute shell command may be 
 important for some users. As a replacement, however, Beeline provides no such 
 capability, possibly because ! notation is reserved for SQLLine commands. 
 It's possible to provide this using a slightly syntactic variation such as 
 !sh cat myfilie.txt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions

2014-06-19 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037339#comment-14037339
 ] 

Swarnim Kulkarni commented on HIVE-7230:


I guess redundancy would have been a better word here. :) My only concern was 
that pulling the formatter settings from the remote guide and also checking in 
the formatter file seems a bit  redundant to me.

 Add Eclipse formatter file for Hive coding conventions
 --

 Key: HIVE-7230
 URL: https://issues.apache.org/jira/browse/HIVE-7230
 Project: Hive
  Issue Type: Improvement
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7230.1.patch, HIVE-7230.2.patch


 Eclipse's formatter is a convenient way to clean up formatting for Java code. 
 Currently, there is no Eclipse formatter file checked into Hive's codebase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7235) TABLESAMPLE on join table is regarded as alias


[ 
https://issues.apache.org/jira/browse/HIVE-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037348#comment-14037348
 ] 

Hive QA commented on HIVE-7235:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651013/HIVE-7235.1.patch.txt

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5654 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/515/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/515/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-515/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651013

 TABLESAMPLE on join table is regarded as alias
 --

 Key: HIVE-7235
 URL: https://issues.apache.org/jira/browse/HIVE-7235
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7235.1.patch.txt


 {noformat}
 SELECT c_custkey, o_custkey
 FROM customer tablesample (1000 ROWS) join orders tablesample (1000 ROWS) on 
 c_custkey = o_custkey;
 {noformat}
 Fails with NPE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

MiniTezCliDriver pre-commit tests are running

2014-06-19 Thread Szehon Ho

(changing subject)

The MiniTezCliDriver tests have timed-out lately in the pre-commit tests,
reducing coverage of the test as Ashutosh reported. I now configured the
parallel-test framework to run MiniTezCliDriver in batches of 15 qtest,
like the others. Now the timeout issue is fixed, and test reports are
showing up for those.

A nice thing is it speeds up the average speed of pre-commit tests by a
lot, as it was bottlenecked on running all the 79 MiniTezCliDriver tests on
one node.

The only impact is, now if you are adding new MiniTezCliDriver tests, they
need to be manually added in the Ptest config on the build machine , like
explained in:
https://cwiki.apache.org/confluence/display/Hive/MiniMR+and+PTest2. I've
added all 79 current tests manually. It might be a bigger impact for this
driver than others, as Hive-Tez is under heavy development. I filed
HIVE-7254 https://issues.apache.org/jira/browse/HIVE-7254 to explore
improving it, but for now please follow that or notify me, to add the new
test to the pre-commit test coverage.

Thanks
Szehon

On Fri, Jun 13, 2014 at 3:16 PM, Brock Noland br...@cloudera.com wrote:

+ dev

Good call, yep that will need to be configured.

Brock

On Fri, Jun 13, 2014 at 10:29 AM, Szehon Ho sze...@cloudera.com wrote:

I was studying this a bit more, I believe the MiniTezCliDriver tests are
hitting timeout after 2 hours as error code is 124. The framework is
running all of them in one call, I'll try to chunk the tests into batches
like the other q-tests.

I'll try to take a look next week at this.

Thanks
Szehon

On Mon, Jun 9, 2014 at 1:13 PM, Szehon Ho sze...@cloudera.com wrote:

It looks like JVM OOM crash during MiniTezCliDriver tests, or its
otherwise crashing. The 407 log has failures, but the 408 log is cut off.

http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-407/failed/TestMiniTezCliDriver/maven-test.txt

http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/maven-test.txt

The MAVEN_OPTS is already set to -XmX2g -XX:MaxPermSize=256M. Do you
guys know of any such issues?

Thanks,
Szehon

On Sun, Jun 8, 2014 at 12:05 PM, Brock Noland br...@cloudera.com
wrote:

Looks like it's failing to generate a to generate a test output:

http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/

http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-408/failed/TestMiniTezCliDriver/TestMiniTezCliDriver.txt

exiting with 124 here:

+ wait 21961
+ timeout 2h mvn -B -o test
-Dmaven.repo.local=/home/hiveptest//ip-10-31-188-232-hiveptest-2/maven
-Phadoop-2 -Phadoop-2 -Dtest=TestMiniTezCliDriver
+ ret=124

On Sun, Jun 8, 2014 at 11:25 AM, Ashutosh Chauhan hashut...@apache.org
wrote:

Build #407 ran MiniTezCliDriver
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/407/testReport/org.apache.hadoop.hive.cli/

but Build #408 didn't
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/408/testReport/org.apache.hadoop.hive.cli/

On Sat, Jun 7, 2014 at 12:25 PM, Szehon Ho sze...@cloudera.com
wrote:

Sounds like there's randomness, either in PTest test-parser or in the
maven test itself. In the history now, its running between 5633-5707,
which is similar to your range.

http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/testReport/history/

I didnt see any in history without MiniTezCLIDriver, can you point me
to a build no. if you see one? If nobody else knows immediately, I can
dig
deeper at it next week to try to find out.

On Sat, Jun 7, 2014 at 9:00 AM, Ashutosh Chauhan
hashut...@apache.org wrote:

I noticed that PTest2 framework runs different number of tests on
various runs. e.g., on yesterday's runs I saw it ran 5585 5510 tests
on
subsequent runs. In particular, it seems its running MiniTezCliDriver
tests
in only half the runs. Anyone observed this?

[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition


 [ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7159:


Status: Patch Available  (was: Open)

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, 
 HIVE-7159.8.patch, HIVE-7159.9.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition


 [ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7159:


Attachment: HIVE-7159.9.patch

rebase

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, 
 HIVE-7159.8.patch, HIVE-7159.9.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition


 [ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7159:


Status: Open  (was: Patch Available)

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, 
 HIVE-7159.8.patch, HIVE-7159.9.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition


[ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037453#comment-14037453
 ] 

Harish Butani commented on HIVE-7159:
-

The prunedCols contain columns from the inputRR. The parent pruner will have 
setup this.

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, 
 HIVE-7159.8.patch, HIVE-7159.9.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7237) hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever


[ 
https://issues.apache.org/jira/browse/HIVE-7237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037466#comment-14037466
 ] 

Hive QA commented on HIVE-7237:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651056/HIVE-7237.1.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5654 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/516/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/516/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-516/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651056

 hive.exec.parallel=true w/ Hive 0.13/Tez causes application to linger forever
 -

 Key: HIVE-7237
 URL: https://issues.apache.org/jira/browse/HIVE-7237
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.13.0
 Environment: HDP 2.1, Hive 0.13, SLES 11, 128GB data nodes, ORC SNAPPY
Reporter: Douglas Moore
Assignee: Navis
 Attachments: HIVE-7237.1.patch.txt


 set hive.exec.parallel=true; will cause the Yarn application instance to 
 linger
 forever. set hive.exec.parallel=false, the application goes away as soon as 
 hive query is complete. The underlying table is an ORC store_sales table 
 compressed with SNAPPY.
 {code}
 hive.exec.parallel=true;
 select * from store_sales where ss_ticket_number=5741230 and ss_item_sk=4825
 {code}
 The query will run under Tez and finish  30 seconds.
 After 30-40 of these jobs the cluster gets to a point where no jobs will 
 finish.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6637) UDF in_file() doesn't take CHAR or VARCHAR as input


[ 
https://issues.apache.org/jira/browse/HIVE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037551#comment-14037551
 ] 

Wilbur Yang commented on HIVE-6637:
---

Regarding the currently submitted patch: I agree with the type checking for 
both arguments, and I think that this code will work in terms of functionality. 
However, I believe that there's a slight problem -- if a user runs a query 
with, say, an INT as the second argument, then the error message will be The 
2nd argument of function IN_FILE must be a string, char or varchar... when it 
really must be a constant string.

 UDF in_file() doesn't take CHAR or VARCHAR as input
 ---

 Key: HIVE-6637
 URL: https://issues.apache.org/jira/browse/HIVE-6637
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
Assignee: Ashish Kumar Singh
 Attachments: HIVE-6637.patch


 {code}
 hive desc alter_varchar_1;
 key   string  None
 value varchar(3)  None
 key2  int None
 value2varchar(10) None
 hive select in_file(value, value2) from alter_varchar_1;
 FAILED: SemanticException [Error 10016]: Line 1:15 Argument type mismatch 
 'value': The 1st argument of function IN_FILE must be a string but 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveVarcharObjectInspector@10f1f34a
  was given.
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7206) Duplicate declaration of build-helper-maven-plugin in root pom


 [ 
https://issues.apache.org/jira/browse/HIVE-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7206:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 Duplicate declaration of build-helper-maven-plugin in root pom
 --

 Key: HIVE-7206
 URL: https://issues.apache.org/jira/browse/HIVE-7206
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-7206.1.patch, HIVE-7206.patch


 Results in following warnings while building:
 [WARNING] Some problems were encountered while building the effective model 
 for org.apache.hive:hive-it-custom-serde:jar:0.14.0-SNAPSHOT
 [WARNING] 'build.pluginManagement.plugins.plugin.(groupId:artifactId)' must 
 be unique but found duplicate declaration of plugin 
 org.codehaus.mojo:build-helper-maven-plugin @ 
 org.apache.hive:hive:0.14.0-SNAPSHOT, pom.xml, line 638, column 17
 [WARNING] 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7257) UDF format_number() does not work on FLOAT types

Wilbur Yang created HIVE-7257:
-

 Summary: UDF format_number() does not work on FLOAT types
 Key: HIVE-7257
 URL: https://issues.apache.org/jira/browse/HIVE-7257
 Project: Hive
  Issue Type: Bug
Reporter: Wilbur Yang
Assignee: Wilbur Yang


#1 Show the table:
hive describe ssga3; 
OK
source  string  
testfloat   
dt  timestamp   
Time taken: 0.243 seconds

#2 Run format_number on double and it works:
hive select format_number(cast(test as double),2) from ssga3;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201403131616_0009, Tracking URL = 
http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009
Kill Command = 
/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job 
-kill job_201403131616_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0%
2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec
2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec
2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec
2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.47 
sec
MapReduce Total cumulative CPU time: 1 seconds 470 msec
Ended Job = job_201403131616_0009
MapReduce Jobs Launched: 
Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 470 msec
OK
1.00
2.00
Time taken: 16.563 seconds

#3 Run format_number on float and it does not work
hive select format_number(test,2) from ssga3; 
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201403131616_0010, Tracking URL = 
http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010
Kill Command = 
/opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job 
-kill job_201403131616_0010
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0%
2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201403131616_0010 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010
Examining task ID: task_201403131616_0010_m_02 (and more) from job 
job_201403131616_0010
Unable to retrieve URL for Hadoop Task logs. Does not contain a valid host:port 
authority: logicaljt
Task with the most failures(4):
Task ID:
task_201403131616_0010_m_00
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row {source:null,test:1.0,dt:null}
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {source:null,test:1.0,dt:null}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141)
..
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition


[ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037589#comment-14037589
 ] 

Ashutosh Chauhan commented on HIVE-7159:


Correct, prunedCols can only be subset of RowSchema, so if size matches  no 
need for Select Op
 +1

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch, HIVE-7159.5.patch, HIVE-7159.6.patch, HIVE-7159.7.patch, 
 HIVE-7159.8.patch, HIVE-7159.9.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions

2014-06-19 Thread David Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037631#comment-14037631
 ] 

David Chen commented on HIVE-7230:
--

[~swarnim], in my patch, I am pointing the Maven Eclipse plugin to the 
formatter file that sits in the root of the source tree and not the remote 
guide.

 Add Eclipse formatter file for Hive coding conventions
 --

 Key: HIVE-7230
 URL: https://issues.apache.org/jira/browse/HIVE-7230
 Project: Hive
  Issue Type: Improvement
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7230.1.patch, HIVE-7230.2.patch


 Eclipse's formatter is a convenient way to clean up formatting for Java code. 
 Currently, there is no Eclipse formatter file checked into Hive's codebase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC


 [ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7250:
-

Attachment: HIVE-7250.5.patch

Added missing apache license header in the newly added unit tests. Also made 
the unit tests not dependant on -Xmx

 Adaptive compression buffer size for wide tables in ORC
 ---

 Key: HIVE-7250
 URL: https://issues.apache.org/jira/browse/HIVE-7250
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, 
 HIVE-7250.4.patch, HIVE-7250.5.patch


 If the input table is wide (in the order of 1000s), ORC compression buffer 
 size overhead becomes significant causing OOM issues. To overcome this issue, 
 buffer size should be adaptively chosen based on the available memory and the 
 number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC


 [ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7250:
-

Attachment: HIVE-7250.5.patch

 Adaptive compression buffer size for wide tables in ORC
 ---

 Key: HIVE-7250
 URL: https://issues.apache.org/jira/browse/HIVE-7250
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, 
 HIVE-7250.4.patch, HIVE-7250.5.patch


 If the input table is wide (in the order of 1000s), ORC compression buffer 
 size overhead becomes significant causing OOM issues. To overcome this issue, 
 buffer size should be adaptively chosen based on the available memory and the 
 number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC


 [ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7250:
-

Attachment: (was: HIVE-7250.5.patch)

 Adaptive compression buffer size for wide tables in ORC
 ---

 Key: HIVE-7250
 URL: https://issues.apache.org/jira/browse/HIVE-7250
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, 
 HIVE-7250.4.patch, HIVE-7250.5.patch


 If the input table is wide (in the order of 1000s), ORC compression buffer 
 size overhead becomes significant causing OOM issues. To overcome this issue, 
 buffer size should be adaptively chosen based on the available memory and the 
 number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22772: HIVE-6637: UDF in_file() doesn't take CHAR or VARCHAR as input

2014-06-19 Thread Ashish Singh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22772/
---

(Updated June 19, 2014, 6:55 p.m.)


Review request for hive.


Changes
---

Made changes based on comment on JIRA.


Bugs: HIVE-6637
https://issues.apache.org/jira/browse/HIVE-6637


Repository: hive-git


Description (updated)
---

HIVE-6637: UDF in_file() doesn't take CHAR or VARCHAR as input


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInFile.java 
ea52537d0b85191f0b633a29aa3f7ddb556c288d 
  ql/src/test/queries/clientpositive/udf_in_file.q 
9d9efe8e23d6e73429ee5cd2c8470359ba2b3498 
  ql/src/test/results/clientpositive/udf_in_file.q.out 
b63143760d80f3f6a8ba0a23c0d87e8bb86fce66 

Diff: https://reviews.apache.org/r/22772/diff/


Testing
---

Tested with qtest.


Thanks,

Ashish Singh

[jira] [Updated] (HIVE-6637) UDF in_file() doesn't take CHAR or VARCHAR as input


 [ 
https://issues.apache.org/jira/browse/HIVE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh updated HIVE-6637:
-

Attachment: (was: HIVE-6637.patch)

 UDF in_file() doesn't take CHAR or VARCHAR as input
 ---

 Key: HIVE-6637
 URL: https://issues.apache.org/jira/browse/HIVE-6637
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
Assignee: Ashish Kumar Singh
 Attachments: HIVE-6637.1.patch


 {code}
 hive desc alter_varchar_1;
 key   string  None
 value varchar(3)  None
 key2  int None
 value2varchar(10) None
 hive select in_file(value, value2) from alter_varchar_1;
 FAILED: SemanticException [Error 10016]: Line 1:15 Argument type mismatch 
 'value': The 1st argument of function IN_FILE must be a string but 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveVarcharObjectInspector@10f1f34a
  was given.
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6637) UDF in_file() doesn't take CHAR or VARCHAR as input


 [ 
https://issues.apache.org/jira/browse/HIVE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh updated HIVE-6637:
-

Attachment: HIVE-6637.1.patch

 UDF in_file() doesn't take CHAR or VARCHAR as input
 ---

 Key: HIVE-6637
 URL: https://issues.apache.org/jira/browse/HIVE-6637
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
Assignee: Ashish Kumar Singh
 Attachments: HIVE-6637.1.patch


 {code}
 hive desc alter_varchar_1;
 key   string  None
 value varchar(3)  None
 key2  int None
 value2varchar(10) None
 hive select in_file(value, value2) from alter_varchar_1;
 FAILED: SemanticException [Error 10016]: Line 1:15 Argument type mismatch 
 'value': The 1st argument of function IN_FILE must be a string but 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveVarcharObjectInspector@10f1f34a
  was given.
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6637) UDF in_file() doesn't take CHAR or VARCHAR as input


[ 
https://issues.apache.org/jira/browse/HIVE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037681#comment-14037681
 ] 

Ashish Kumar Singh commented on HIVE-6637:
--

[~wilbur.yang] good point. Updated the patch and RB.

 UDF in_file() doesn't take CHAR or VARCHAR as input
 ---

 Key: HIVE-6637
 URL: https://issues.apache.org/jira/browse/HIVE-6637
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.14.0
Reporter: Xuefu Zhang
Assignee: Ashish Kumar Singh
 Attachments: HIVE-6637.1.patch


 {code}
 hive desc alter_varchar_1;
 key   string  None
 value varchar(3)  None
 key2  int None
 value2varchar(10) None
 hive select in_file(value, value2) from alter_varchar_1;
 FAILED: SemanticException [Error 10016]: Line 1:15 Argument type mismatch 
 'value': The 1st argument of function IN_FILE must be a string but 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveVarcharObjectInspector@10f1f34a
  was given.
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7251) Fix StorageDescriptor usage in unit tests


[ 
https://issues.apache.org/jira/browse/HIVE-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037754#comment-14037754
 ] 

Hive QA commented on HIVE-7251:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651195/HIVE-7251.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5639 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/517/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/517/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-517/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651195

 Fix StorageDescriptor usage in unit tests 
 --

 Key: HIVE-7251
 URL: https://issues.apache.org/jira/browse/HIVE-7251
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.1
Reporter: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7251.patch


 Current Approach : 
 StorageDescriptor class is used to describe parameters like InputFormat, 
 Outputformat, SerDeInfo, etc. for a hive table.
 Some of the class variables like InputFormat, OutputFormat, 
 SerDeInfo.serializationLib, etc. are required fields when
 creating a storage descriptor object.
 For example : createTable command in the metaStoreClient creates the table 
 with the default values of such variables defined in HiveConf or 
 hive-default.xml
 But in unit tests, table is created in a slightly different way, that these 
 values need to be set explicitly.
 Thus, when creating tables in tests, required fieldes of StorageDescriptor 
 object need to be set.
 Issue with current approach :
 From some of  the current usages of this class in unit tests, I noticed that
 when any one of the test cases tried to clean up the database and found a 
 table created by any of the previously executed test cases,
 the clean up process tries to fetch the Table object and performs the sanity 
 checks which include checking for required fields like InputFormat, 
 OutputFormat, SerDeInfo.serializationLib
 of the table. The sanity checks fail which results in failure of the test 
 case.
 Fix :
 In unit-tests, StorageDescriptor object should be created with the Fields 
 that are sanity checked when trying to fetch the table.
 NOTE : This fix fixes 6 test cases in itests/hive-unit/



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7251) Fix StorageDescriptor usage in unit tests


 [ 
https://issues.apache.org/jira/browse/HIVE-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7251:
---

Assignee: Pankit Thapar

 Fix StorageDescriptor usage in unit tests 
 --

 Key: HIVE-7251
 URL: https://issues.apache.org/jira/browse/HIVE-7251
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.1
Reporter: Pankit Thapar
Assignee: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7251.patch


 Current Approach : 
 StorageDescriptor class is used to describe parameters like InputFormat, 
 Outputformat, SerDeInfo, etc. for a hive table.
 Some of the class variables like InputFormat, OutputFormat, 
 SerDeInfo.serializationLib, etc. are required fields when
 creating a storage descriptor object.
 For example : createTable command in the metaStoreClient creates the table 
 with the default values of such variables defined in HiveConf or 
 hive-default.xml
 But in unit tests, table is created in a slightly different way, that these 
 values need to be set explicitly.
 Thus, when creating tables in tests, required fieldes of StorageDescriptor 
 object need to be set.
 Issue with current approach :
 From some of  the current usages of this class in unit tests, I noticed that
 when any one of the test cases tried to clean up the database and found a 
 table created by any of the previously executed test cases,
 the clean up process tries to fetch the Table object and performs the sanity 
 checks which include checking for required fields like InputFormat, 
 OutputFormat, SerDeInfo.serializationLib
 of the table. The sanity checks fail which results in failure of the test 
 case.
 Fix :
 In unit-tests, StorageDescriptor object should be created with the Fields 
 that are sanity checked when trying to fetch the table.
 NOTE : This fix fixes 6 test cases in itests/hive-unit/



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7251) Fix StorageDescriptor usage in unit tests


[ 
https://issues.apache.org/jira/browse/HIVE-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037768#comment-14037768
 ] 

Ashutosh Chauhan commented on HIVE-7251:


+1

 Fix StorageDescriptor usage in unit tests 
 --

 Key: HIVE-7251
 URL: https://issues.apache.org/jira/browse/HIVE-7251
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.1
Reporter: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7251.patch


 Current Approach : 
 StorageDescriptor class is used to describe parameters like InputFormat, 
 Outputformat, SerDeInfo, etc. for a hive table.
 Some of the class variables like InputFormat, OutputFormat, 
 SerDeInfo.serializationLib, etc. are required fields when
 creating a storage descriptor object.
 For example : createTable command in the metaStoreClient creates the table 
 with the default values of such variables defined in HiveConf or 
 hive-default.xml
 But in unit tests, table is created in a slightly different way, that these 
 values need to be set explicitly.
 Thus, when creating tables in tests, required fieldes of StorageDescriptor 
 object need to be set.
 Issue with current approach :
 From some of  the current usages of this class in unit tests, I noticed that
 when any one of the test cases tried to clean up the database and found a 
 table created by any of the previously executed test cases,
 the clean up process tries to fetch the Table object and performs the sanity 
 checks which include checking for required fields like InputFormat, 
 OutputFormat, SerDeInfo.serializationLib
 of the table. The sanity checks fail which results in failure of the test 
 case.
 Fix :
 In unit-tests, StorageDescriptor object should be created with the Fields 
 that are sanity checked when trying to fetch the table.
 NOTE : This fix fixes 6 test cases in itests/hive-unit/



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7219) Improve performance of serialization utils in ORC

[
https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037771#comment-14037771
]

Prasanth J commented on HIVE-7219:
--

bq. Question: Should the following information from Prasanth J also be
documented, and if so does it belong in the ORC wikidoc or with the parameter
description in Configuration Properties?
bq. For integers, this patch will improve only very specific cases. If the
encoding uses SHORT_REPEAT, DELTA (esp. fixed delta), PATCHED_BLOB then this
patch will NOT have any effect, as these encodings does not use bit packing.
The bit packed encodings like DIRECT, DELTA (variable delta) will see
improvements.

I think these are too specific for it to be put into user documentation.

Improve performance of serialization utils in ORC
-

Key: HIVE-7219
URL: https://issues.apache.org/jira/browse/HIVE-7219
Project: Hive
Issue Type: Improvement
Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Labels: TODOC14
Fix For: 0.14.0

Attachments: HIVE-7219.1.patch, HIVE-7219.2.patch, HIVE-7219.3.patch,
HIVE-7219.4.patch, orc-read-perf-jmh-benchmark.png

ORC uses serialization utils heavily for reading and writing data. The
bitpacking and unpacking code in writeInts() and readInts() can be unrolled
for better performance. Also double reader/writer performance can be improved
by bulk reading/writing from/to byte array.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION


 [ 
https://issues.apache.org/jira/browse/HIVE-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7051:
---

Component/s: Statistics

 Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
 -

 Key: HIVE-7051
 URL: https://issues.apache.org/jira/browse/HIVE-7051
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Prasanth J
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7051.1.patch


 Same as HIVE-7050 but for partitions



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION


 [ 
https://issues.apache.org/jira/browse/HIVE-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7051:
---

Attachment: HIVE-7051.1.patch

Patch to implement the same.

 Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
 -

 Key: HIVE-7051
 URL: https://issues.apache.org/jira/browse/HIVE-7051
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Prasanth J
 Attachments: HIVE-7051.1.patch


 Same as HIVE-7050 but for partitions



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION


 [ 
https://issues.apache.org/jira/browse/HIVE-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7051:
---

Status: Patch Available  (was: Open)

 Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
 -

 Key: HIVE-7051
 URL: https://issues.apache.org/jira/browse/HIVE-7051
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Prasanth J
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7051.1.patch


 Same as HIVE-7050 but for partitions



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION


 [ 
https://issues.apache.org/jira/browse/HIVE-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned HIVE-7051:
--

Assignee: Ashutosh Chauhan

 Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
 -

 Key: HIVE-7051
 URL: https://issues.apache.org/jira/browse/HIVE-7051
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Prasanth J
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7051.1.patch


 Same as HIVE-7050 but for partitions



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 22782: Display partition level column stats

2014-06-19 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22782/
---

Review request for hive and Gunther Hagleitner.


Bugs: HIVE-7051
https://issues.apache.org/jira/browse/HIVE-7051


Repository: hive-git


Description
---

Display partition level column stats


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java fad5ed3 
  ql/src/test/queries/clientpositive/columnstats_partlvl.q 8bf6c70 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out a4c4677 

Diff: https://reviews.apache.org/r/22782/diff/


Testing
---

Added new tests


Thanks,

Ashutosh Chauhan

[jira] [Commented] (HIVE-7255) Allow partial partition spec in analyze command


[ 
https://issues.apache.org/jira/browse/HIVE-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037804#comment-14037804
 ] 

Ashutosh Chauhan commented on HIVE-7255:


To enable further testing of this, HIVE-7051 will be of great help. [~hagleitn] 
Can you take a look at that first?

 Allow partial partition spec in analyze command
 ---

 Key: HIVE-7255
 URL: https://issues.apache.org/jira/browse/HIVE-7255
 Project: Hive
  Issue Type: New Feature
  Components: Statistics
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7255.1.patch


 So that stats collection can happen for multiple partitions through one 
 statement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7098) RecordUpdater should extend RecordWriter


[ 
https://issues.apache.org/jira/browse/HIVE-7098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037817#comment-14037817
 ] 

Alan Gates commented on HIVE-7098:
--

Ran the failing tests locally, all of which pass except root_dir_external_table 
which fails on trunk as well.  So I conclude none of these are issues for this 
patch.

 RecordUpdater should extend RecordWriter
 

 Key: HIVE-7098
 URL: https://issues.apache.org/jira/browse/HIVE-7098
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats, Transactions
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7098.patch


 A new interface ql.io.RecordUpdater was added as part of the ACID work in 
 0.13.  This interface should extend RecordWriter because:
 # If it does not significant portions of FileSinkOperator will have to be 
 reworked to handle both RecordWriter and RecordUpdater
 # Once a file format accepts transactions, it should not generally be 
 possible to write using RecordWriter.write as that will write old style 
 records without transaction information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-06-19 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037822#comment-14037822
 ] 

Sushanth Sowmyan commented on HIVE-7223:


Is there a need for a ListPartition getPartitions() when there is already a 
IteratorPartition being exposed? If we do a partitionspec interface such as 
this, I'd like to keep it as lean as possible.

Also, I wonder if MapKey,Value isn't a more fundamental interface point, 
instead of a String partitionname, since partition names can be arbitrary, but 
partition key values define a partition. To wit, what do you think about a 
PartitionSpec interface that looks like this:

{code}
public interface PartitionSpec {
public IteratorPartition getPartitionsIter();
public IteratorMapString,String getPartKeyValuesIter();
}
{code}

Thoughts?

 Support generic PartitionSpecs in Metastore partition-functions
 ---

 Key: HIVE-7223
 URL: https://issues.apache.org/jira/browse/HIVE-7223
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.12.0, 0.13.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan

 Currently, the functions in the HiveMetaStore API that handle multiple 
 partitions do so using ListPartition. E.g. 
 {code}
 public ListPartition listPartitions(String db_name, String tbl_name, short 
 max_parts);
 public ListPartition listPartitionsByFilter(String db_name, String 
 tbl_name, String filter, short max_parts);
 public int add_partitions(ListPartition new_parts);
 {code}
 Partition objects are fairly heavyweight, since each Partition carries its 
 own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
 thousands of partitions take so long to have their partitions listed that the 
 client times out with default hive.metastore.client.socket.timeout. There is 
 the additional expense of serializing and deserializing metadata for large 
 sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
 should help in this regard.
 In a date-partitioned table, all sub-partitions for a particular date are 
 *likely* (but not expected) to have:
 # The same base directory (e.g. {{/feeds/search/20140601/}})
 # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
 # The same SerDe/StorageHandler/IOFormat classes
 # Sorting/Bucketing/SkewInfo settings
 In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
 represent the partition-list (for a date) in a more condensed form: a list of 
 LighterPartition instances, all sharing a common StorageDescriptor whose 
 location points to the root directory. 
 We can go one better for the {{add_partitions()}} case: When adding all 
 partitions for a given date, the “normal” case affords us the ability to 
 specify the top-level date-directory, where sub-partitions can be inferred 
 from the HDFS directory-path.
 These extensions are hard to introduce at the metastore-level, since 
 partition-functions explicitly specify {{ListPartition}} arguments. I 
 wonder if a {{PartitionSpec}} interface might help:
 {code}
 public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
 ; 
 public int add_partitions( PartitionSpec new_parts ) throws … ;
 {code}
 where the PartitionSpec looks like:
 {code}
 public interface PartitionSpec {
 public ListPartition getPartitions();
 public ListString getPartNames();
 public IteratorPartition getPartitionIter();
 public IteratorString getPartNameIter();
 }
 {code}
 For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
 {{PartitionSpec}}, store a top-level directory, and return Partition 
 instances from sub-directory names, while storing a single StorageDescriptor 
 for all of them.
 Similarly, list_partitions() could return a ListPartitionSpec, where each 
 PartitionSpec corresponds to a set or partitions that can share a 
 StorageDescriptor.
 By exposing iterator semantics, neither the client nor the metastore need 
 instantiate all partitions at once. That should help with memory requirements.
 In case no smart grouping is possible, we could just fall back on a 
 {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse 
 than status quo.
 PartitionSpec abstracts away how a set of partitions may be represented. A 
 tighter representation allows us to communicate metadata for a larger number 
 of Partitions, with less Thrift traffic.
 Given that Thrift doesn’t support polymorphism, we’d have to implement the 
 PartitionSpec as a Thrift Union of supported implementations. (We could 
 convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
 sub-class.)
 Thoughts?



--
This message was sent by

Re: Review Request 22770: Allow partial partition spec in analyze command

2014-06-19 Thread Ashutosh Chauhan



 On June 19, 2014, 3:51 a.m., Szehon Ho wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java, line 73
  https://reviews.apache.org/r/22770/diff/1/?file=612845#file612845line73
 
  Looks like no longer used.

will get rid of it.


 On June 19, 2014, 3:51 a.m., Szehon Ho wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java, line 422
  https://reviews.apache.org/r/22770/diff/1/?file=612845#file612845line422
 
  We can remove 'totalRows' variable as well.

yup. will remove


 On June 19, 2014, 3:51 a.m., Szehon Ho wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java, line 425
  https://reviews.apache.org/r/22770/diff/1/?file=612845#file612845line425
 
  Do we need to cleanup the fetch operator at some point?

Yeah, will do.


 On June 19, 2014, 3:51 a.m., Szehon Ho wrote:
  ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsDesc.java, line 65
  https://reviews.apache.org/r/22770/diff/1/?file=612849#file612849line65
 
  Same question, cant we still display partial partition spec?

not sure, if its any useful in explain, but will see if I can add it back.


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22770/#review46179
---


On June 19, 2014, 2:35 a.m., Ashutosh Chauhan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22770/
 ---
 
 (Updated June 19, 2014, 2:35 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7255
 https://issues.apache.org/jira/browse/HIVE-7255
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 So that stats collection can happen for multiple partitions through one 
 statement.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 8ae1c73 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 47a6871 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
 1270520 
   ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java a4ba4bd 
   ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java b75f78c 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsDesc.java 86e6877 
   ql/src/test/queries/clientnegative/columnstats_partlvl_dp.q b4887c4 
   ql/src/test/queries/clientnegative/columnstats_partlvl_incorrect_num_keys.q 
 2f8e927 
   ql/src/test/queries/clientpositive/columnstats_partlvl_dp.q PRE-CREATION 
   ql/src/test/queries/clientpositive/columnstats_partlvl_incorrect_num_keys.q 
 PRE-CREATION 
   ql/src/test/results/clientnegative/columnstats_partlvl_invalid_values.q.out 
 d48d8cb 
   ql/src/test/results/clientpositive/columnstats_partlvl.q.out a4c4677 
   ql/src/test/results/clientpositive/columnstats_partlvl_dp.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/columnstats_partlvl_incorrect_num_keys.q.out
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/22770/diff/
 
 
 Testing
 ---
 
 Moved -ve tests to +ve.
 
 
 Thanks,
 
 Ashutosh Chauhan

[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions


[ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037837#comment-14037837
 ] 

Alan Gates commented on HIVE-7223:
--

Seems reasonable.  For backwards compatibility you will want to leave the 
existing thrift calls there and create new ones that handle your new union 
type.  We can mark the old calls as deprecated and remove them after 0.14, but 
we should give people at least one release to move.

Your iterator model would seem to imply that you could fetch just some 
partitions from the metastore in a thrift call, and go back for more later.  Is 
that part of your plan?  Part of a later phase?  Or just an unintended side 
effect?

 Support generic PartitionSpecs in Metastore partition-functions
 ---

 Key: HIVE-7223
 URL: https://issues.apache.org/jira/browse/HIVE-7223
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.12.0, 0.13.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan

 Currently, the functions in the HiveMetaStore API that handle multiple 
 partitions do so using ListPartition. E.g. 
 {code}
 public ListPartition listPartitions(String db_name, String tbl_name, short 
 max_parts);
 public ListPartition listPartitionsByFilter(String db_name, String 
 tbl_name, String filter, short max_parts);
 public int add_partitions(ListPartition new_parts);
 {code}
 Partition objects are fairly heavyweight, since each Partition carries its 
 own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
 thousands of partitions take so long to have their partitions listed that the 
 client times out with default hive.metastore.client.socket.timeout. There is 
 the additional expense of serializing and deserializing metadata for large 
 sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
 should help in this regard.
 In a date-partitioned table, all sub-partitions for a particular date are 
 *likely* (but not expected) to have:
 # The same base directory (e.g. {{/feeds/search/20140601/}})
 # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
 # The same SerDe/StorageHandler/IOFormat classes
 # Sorting/Bucketing/SkewInfo settings
 In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
 represent the partition-list (for a date) in a more condensed form: a list of 
 LighterPartition instances, all sharing a common StorageDescriptor whose 
 location points to the root directory. 
 We can go one better for the {{add_partitions()}} case: When adding all 
 partitions for a given date, the “normal” case affords us the ability to 
 specify the top-level date-directory, where sub-partitions can be inferred 
 from the HDFS directory-path.
 These extensions are hard to introduce at the metastore-level, since 
 partition-functions explicitly specify {{ListPartition}} arguments. I 
 wonder if a {{PartitionSpec}} interface might help:
 {code}
 public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
 ; 
 public int add_partitions( PartitionSpec new_parts ) throws … ;
 {code}
 where the PartitionSpec looks like:
 {code}
 public interface PartitionSpec {
 public ListPartition getPartitions();
 public ListString getPartNames();
 public IteratorPartition getPartitionIter();
 public IteratorString getPartNameIter();
 }
 {code}
 For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
 {{PartitionSpec}}, store a top-level directory, and return Partition 
 instances from sub-directory names, while storing a single StorageDescriptor 
 for all of them.
 Similarly, list_partitions() could return a ListPartitionSpec, where each 
 PartitionSpec corresponds to a set or partitions that can share a 
 StorageDescriptor.
 By exposing iterator semantics, neither the client nor the metastore need 
 instantiate all partitions at once. That should help with memory requirements.
 In case no smart grouping is possible, we could just fall back on a 
 {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse 
 than status quo.
 PartitionSpec abstracts away how a set of partitions may be represented. A 
 tighter representation allows us to communicate metadata for a larger number 
 of Partitions, with less Thrift traffic.
 Given that Thrift doesn’t support polymorphism, we’d have to implement the 
 PartitionSpec as a Thrift Union of supported implementations. (We could 
 convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
 sub-class.)
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-06-19 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037835#comment-14037835
 ] 

Sushanth Sowmyan commented on HIVE-7223:


Hm.. I just realized that as far as the metastore db definitions go, we define 
a unique key constraint on partition name, but make no such guarantee on the 
key-values themselves, which are not metastore entities. I don't like it, but I 
see. I'll have to dig a bit more into this.

 Support generic PartitionSpecs in Metastore partition-functions
 ---

 Key: HIVE-7223
 URL: https://issues.apache.org/jira/browse/HIVE-7223
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.12.0, 0.13.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan

 Currently, the functions in the HiveMetaStore API that handle multiple 
 partitions do so using ListPartition. E.g. 
 {code}
 public ListPartition listPartitions(String db_name, String tbl_name, short 
 max_parts);
 public ListPartition listPartitionsByFilter(String db_name, String 
 tbl_name, String filter, short max_parts);
 public int add_partitions(ListPartition new_parts);
 {code}
 Partition objects are fairly heavyweight, since each Partition carries its 
 own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
 thousands of partitions take so long to have their partitions listed that the 
 client times out with default hive.metastore.client.socket.timeout. There is 
 the additional expense of serializing and deserializing metadata for large 
 sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
 should help in this regard.
 In a date-partitioned table, all sub-partitions for a particular date are 
 *likely* (but not expected) to have:
 # The same base directory (e.g. {{/feeds/search/20140601/}})
 # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
 # The same SerDe/StorageHandler/IOFormat classes
 # Sorting/Bucketing/SkewInfo settings
 In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
 represent the partition-list (for a date) in a more condensed form: a list of 
 LighterPartition instances, all sharing a common StorageDescriptor whose 
 location points to the root directory. 
 We can go one better for the {{add_partitions()}} case: When adding all 
 partitions for a given date, the “normal” case affords us the ability to 
 specify the top-level date-directory, where sub-partitions can be inferred 
 from the HDFS directory-path.
 These extensions are hard to introduce at the metastore-level, since 
 partition-functions explicitly specify {{ListPartition}} arguments. I 
 wonder if a {{PartitionSpec}} interface might help:
 {code}
 public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
 ; 
 public int add_partitions( PartitionSpec new_parts ) throws … ;
 {code}
 where the PartitionSpec looks like:
 {code}
 public interface PartitionSpec {
 public ListPartition getPartitions();
 public ListString getPartNames();
 public IteratorPartition getPartitionIter();
 public IteratorString getPartNameIter();
 }
 {code}
 For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
 {{PartitionSpec}}, store a top-level directory, and return Partition 
 instances from sub-directory names, while storing a single StorageDescriptor 
 for all of them.
 Similarly, list_partitions() could return a ListPartitionSpec, where each 
 PartitionSpec corresponds to a set or partitions that can share a 
 StorageDescriptor.
 By exposing iterator semantics, neither the client nor the metastore need 
 instantiate all partitions at once. That should help with memory requirements.
 In case no smart grouping is possible, we could just fall back on a 
 {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse 
 than status quo.
 PartitionSpec abstracts away how a set of partitions may be represented. A 
 tighter representation allows us to communicate metadata for a larger number 
 of Partitions, with less Thrift traffic.
 Given that Thrift doesn’t support polymorphism, we’d have to implement the 
 PartitionSpec as a Thrift Union of supported implementations. (We could 
 convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
 sub-class.)
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7220) Empty dir in external table causes issue (root_dir_external_table.q failure)

[
https://issues.apache.org/jira/browse/HIVE-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037843#comment-14037843
]

Gunther Hagleitner commented on HIVE-7220:
--

I think we should move forward with this it will give us a working build, while
we work out MAPREDUCE-5756. We have HIVE-6401 open to handle the situation when
we get a fix.

I've reviewed the patch, it looks good except for the isValidSplit call. Why is
that needed? You prune in the constructor so presumably you never get splits
containing folders. If this is just a sanity check it should probably throw an
assertion if there's still paths in there. If not - it seems incorrect to throw
out splits that don't match (especially since you might throw out combined
valid locations with it).

Empty dir in external table causes issue (root_dir_external_table.q failure)

Key: HIVE-7220
URL: https://issues.apache.org/jira/browse/HIVE-7220
Project: Hive
Issue Type: Bug
Reporter: Szehon Ho
Assignee: Szehon Ho
Attachments: HIVE-7220.patch

While looking at root_dir_external_table.q failure, which is doing a query on
an external table located at root ('/'), I noticed that latest Hadoop2
CombineFileInputFormat returns split representing empty directories (like
'/Users'), which leads to failure in Hive's CombineFileRecordReader as it
tries to open the directory for processing.
Tried with an external table in a normal HDFS directory, and it also returns
the same error. Looks like a real bug.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6967) Hive transaction manager fails when SQLServer is used as an RDBMS


[ 
https://issues.apache.org/jira/browse/HIVE-6967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037842#comment-14037842
 ] 

Alan Gates commented on HIVE-6967:
--

Failed tests pass when run locally, except for root_dir_external_table which 
fails against trunk, so I don't think any of these failures relate to this 
patch.

 Hive transaction manager fails when SQLServer is used as an RDBMS
 -

 Key: HIVE-6967
 URL: https://issues.apache.org/jira/browse/HIVE-6967
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-6967.patch


 When using SQLServer as an RDBMS for the metastore, any transaction or 
 DbLockMgr operations fail with:
 {code}
 MetaException(message:Unable to select from transaction database 
 com.microsoft.sqlserver.jdbc.SQLServerException: Line 1: FOR UPDATE clause 
 allowed only for DECLARE CURSOR.
 {code}
 The issue is that SQLServer does not support the FOR UPDATE clause in SELECT.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION


[ 
https://issues.apache.org/jira/browse/HIVE-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037852#comment-14037852
 ] 

Gunther Hagleitner commented on HIVE-7051:
--

+1

 Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
 -

 Key: HIVE-7051
 URL: https://issues.apache.org/jira/browse/HIVE-7051
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Prasanth J
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7051.1.patch


 Same as HIVE-7050 but for partitions



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7255) Allow partial partition spec in analyze command


[ 
https://issues.apache.org/jira/browse/HIVE-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037857#comment-14037857
 ] 

Gunther Hagleitner commented on HIVE-7255:
--

Sure. HIVE-7051 looks good to me.

 Allow partial partition spec in analyze command
 ---

 Key: HIVE-7255
 URL: https://issues.apache.org/jira/browse/HIVE-7255
 Project: Hive
  Issue Type: New Feature
  Components: Statistics
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7255.1.patch


 So that stats collection can happen for multiple partitions through one 
 statement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7202) DbTxnManager deadlocks in hcatalog.cli.TestSematicAnalysis.testAlterTblFFpart()


[ 
https://issues.apache.org/jira/browse/HIVE-7202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037885#comment-14037885
 ] 

Alan Gates commented on HIVE-7202:
--

Tests pass when run locally except root_dir_external_table and 
authorization_ctas, both of which also fail on trunk.  So I do not think these 
failures are related to the patch.

 DbTxnManager deadlocks in 
 hcatalog.cli.TestSematicAnalysis.testAlterTblFFpart()
 ---

 Key: HIVE-7202
 URL: https://issues.apache.org/jira/browse/HIVE-7202
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Alan Gates
 Fix For: 0.14.0

 Attachments: HIVE-7202.patch


 select * from HIVE_LOCKS produces
 {noformat}
 6   |1   |0   |default
   
|junit_sem_analysis
   
 |NULL 
|w|r|1402354627716 
   |NULL|unknown   
   
 |ekoifman.local   

 6   |2   |0   |default
   
|junit_sem_analysis
   
 |b=2010-10-10 
|w|e|1402354627716 
   |NULL|unknown   
   
 |ekoifman.local   

 2 rows selected
 {noformat}
 easiest way to repro this is to add
 hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY, true);
 hiveConf.setVar(HiveConf.ConfVars.HIVE_TXN_MANAGER, 
 org.apache.hadoop.hive.ql.lockmgr.DbTxnManager);
 in HCatBaseTest.setUpHiveConf()



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7242) alter table drop partition is acquiring the wrong type of lock


 [ 
https://issues.apache.org/jira/browse/HIVE-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7242:
-

Status: Open  (was: Patch Available)

Forgot to wait for HIVE-7202 to be checked in before I marked patch available.

 alter table drop partition is acquiring the wrong type of lock
 --

 Key: HIVE-7242
 URL: https://issues.apache.org/jira/browse/HIVE-7242
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.14.0

 Attachments: HIVE-7242.patch


 Doing an alter table foo drop partition ('bar=x') acquired a shared-write 
 lock on partition bar=x.  It should be acquiring an exclusive lock in that 
 case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7241) Wrong lock acquired for alter table rename partition


 [ 
https://issues.apache.org/jira/browse/HIVE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7241:
-

Status: Open  (was: Patch Available)

Need to wait for HIVE-7202 before marking this one patch available.

 Wrong lock acquired for alter table rename partition
 

 Key: HIVE-7241
 URL: https://issues.apache.org/jira/browse/HIVE-7241
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7241.patch


 Doing an alter table foo partition (bar='x') rename to partition (bar='y') 
 acquires a read lock on table foo.  It should instead acquire an exclusive 
 lock on partition bar=x.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-19 Thread Carl Steinbach (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037893#comment-14037893
 ] 

Carl Steinbach commented on HIVE-7094:
--

[~sushanth]: I'm planning to commit this patch tonight. Please let me know if I 
should hold off. Thanks.

 Separate out static/dynamic partitioning code in FileRecordWriterContainer
 --

 Key: HIVE-7094
 URL: https://issues.apache.org/jira/browse/HIVE-7094
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7094.1.patch, HIVE-7094.3.patch, HIVE-7094.4.patch, 
 HIVE-7094.5.patch


 There are two major places in FileRecordWriterContainer that have the {{if 
 (dynamicPartitioning)}} condition: the constructor and write().
 This is the approach that I am taking:
 # Move the DP and SP code into two subclasses: 
 DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer.
 # Make FileRecordWriterContainer an abstract class that contains the common 
 code for both implementations. For write(), FileRecordWriterContainer will 
 call an abstract method that will provide the local RecordWriter, 
 ObjectInspector, SerDe, and OutputJobInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test


[ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037892#comment-14037892
 ] 

Gunther Hagleitner commented on HIVE-7254:
--

Would moving the minimr/tez/etc properties to a separate file and using 
http://mojo.codehaus.org/properties-maven-plugin/ be an option?

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho

 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-06-19 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037910#comment-14037910
 ] 

Mithun Radhakrishnan commented on HIVE-7223:


[~sushanth]: Yep, you're right. I was kinda hoping that'd be brought up in the 
review. It'd be trivial to construct a full ListPartition from just the 
iterator.

[~alangates]: I agree. These ought to be parallel APIs until some point after 
0.14.

bq. Your iterator model would seem to imply that you could fetch just some 
partitions from the metastore in a thrift call, and go back for more later. Is 
that part of your plan?

Yes, that's definitely on the cards. As per [~sershe]'s comments on HIVE-7195, 
it should be possible to capture cursor/transaction-like semantics, allowing us 
to batch the partitions returned from {{listPartitions()}}. I would like to 
first focus on the {{addPartitions(PartitionSpec)}} APIs for Falcon (etc.), and 
leave the PartitionSpec based {{listPartitions()}} API use 
{{DefaultPartitionSpec}}. Or maybe a {{CompressedPartitionSpec}} that shares 
StorageDescriptor instances.
We can put in batched reads in a later JIRA, since it's so much more work.

 Support generic PartitionSpecs in Metastore partition-functions
 ---

 Key: HIVE-7223
 URL: https://issues.apache.org/jira/browse/HIVE-7223
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog, Metastore
Affects Versions: 0.12.0, 0.13.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan

 Currently, the functions in the HiveMetaStore API that handle multiple 
 partitions do so using ListPartition. E.g. 
 {code}
 public ListPartition listPartitions(String db_name, String tbl_name, short 
 max_parts);
 public ListPartition listPartitionsByFilter(String db_name, String 
 tbl_name, String filter, short max_parts);
 public int add_partitions(ListPartition new_parts);
 {code}
 Partition objects are fairly heavyweight, since each Partition carries its 
 own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
 thousands of partitions take so long to have their partitions listed that the 
 client times out with default hive.metastore.client.socket.timeout. There is 
 the additional expense of serializing and deserializing metadata for large 
 sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
 should help in this regard.
 In a date-partitioned table, all sub-partitions for a particular date are 
 *likely* (but not expected) to have:
 # The same base directory (e.g. {{/feeds/search/20140601/}})
 # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
 # The same SerDe/StorageHandler/IOFormat classes
 # Sorting/Bucketing/SkewInfo settings
 In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
 represent the partition-list (for a date) in a more condensed form: a list of 
 LighterPartition instances, all sharing a common StorageDescriptor whose 
 location points to the root directory. 
 We can go one better for the {{add_partitions()}} case: When adding all 
 partitions for a given date, the “normal” case affords us the ability to 
 specify the top-level date-directory, where sub-partitions can be inferred 
 from the HDFS directory-path.
 These extensions are hard to introduce at the metastore-level, since 
 partition-functions explicitly specify {{ListPartition}} arguments. I 
 wonder if a {{PartitionSpec}} interface might help:
 {code}
 public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
 ; 
 public int add_partitions( PartitionSpec new_parts ) throws … ;
 {code}
 where the PartitionSpec looks like:
 {code}
 public interface PartitionSpec {
 public ListPartition getPartitions();
 public ListString getPartNames();
 public IteratorPartition getPartitionIter();
 public IteratorString getPartNameIter();
 }
 {code}
 For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
 {{PartitionSpec}}, store a top-level directory, and return Partition 
 instances from sub-directory names, while storing a single StorageDescriptor 
 for all of them.
 Similarly, list_partitions() could return a ListPartitionSpec, where each 
 PartitionSpec corresponds to a set or partitions that can share a 
 StorageDescriptor.
 By exposing iterator semantics, neither the client nor the metastore need 
 instantiate all partitions at once. That should help with memory requirements.
 In case no smart grouping is possible, we could just fall back on a 
 {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse 
 than status quo.
 PartitionSpec abstracts away how a set of partitions may be represented. A 
 tighter representation allows us to communicate metadata for a

[jira] [Created] (HIVE-7258) Move qtest-Driver properties from pom to separate file

Szehon Ho created HIVE-7258:
---

 Summary: Move qtest-Driver properties from pom to separate file  
 Key: HIVE-7258
 URL: https://issues.apache.org/jira/browse/HIVE-7258
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Szehon Ho






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test


[ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14037921#comment-14037921
 ] 

Szehon Ho commented on HIVE-7254:
-

Created a subtask HIVE-7258 on that suggestion.  Anyone is welcome to take it.

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho

 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test


 [ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7254:


Issue Type: Test  (was: Bug)

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho

 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7257) UDF format_number() does not work on FLOAT types


 [ 
https://issues.apache.org/jira/browse/HIVE-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilbur Yang updated HIVE-7257:
--

Attachment: HIVE-7257.1.patch

 UDF format_number() does not work on FLOAT types
 

 Key: HIVE-7257
 URL: https://issues.apache.org/jira/browse/HIVE-7257
 Project: Hive
  Issue Type: Bug
Reporter: Wilbur Yang
Assignee: Wilbur Yang
 Attachments: HIVE-7257.1.patch


 #1 Show the table:
 hive describe ssga3; 
 OK
 sourcestring  
 test  float   
 dttimestamp   
 Time taken: 0.243 seconds
 #2 Run format_number on double and it works:
 hive select format_number(cast(test as double),2) from ssga3;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201403131616_0009, Tracking URL = 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009
 Kill Command = 
 /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job 
 -kill job_201403131616_0009
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers:  0
 2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0%
 2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 
 1.47 sec
 MapReduce Total cumulative CPU time: 1 seconds 470 msec
 Ended Job = job_201403131616_0009
 MapReduce Jobs Launched: 
 Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS
 Total MapReduce CPU Time Spent: 1 seconds 470 msec
 OK
 1.00
 2.00
 Time taken: 16.563 seconds
 #3 Run format_number on float and it does not work
 hive select format_number(test,2) from ssga3; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201403131616_0010, Tracking URL = 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010
 Kill Command = 
 /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job 
 -kill job_201403131616_0010
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers:  0
 2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0%
 2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100%
 Ended Job = job_201403131616_0010 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010
 Examining task ID: task_201403131616_0010_m_02 (and more) from job 
 job_201403131616_0010
 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid 
 host:port authority: logicaljt
 Task with the most failures(4):
 Task ID:
 task_201403131616_0010_m_00
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row {source:null,test:1.0,dt:null}
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
 at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {source:null,test:1.0,dt:null}
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141)
 ..
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7257) UDF format_number() does not work on FLOAT types


 [ 
https://issues.apache.org/jira/browse/HIVE-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilbur Yang updated HIVE-7257:
--

Status: Patch Available  (was: Open)

 UDF format_number() does not work on FLOAT types
 

 Key: HIVE-7257
 URL: https://issues.apache.org/jira/browse/HIVE-7257
 Project: Hive
  Issue Type: Bug
Reporter: Wilbur Yang
Assignee: Wilbur Yang
 Attachments: HIVE-7257.1.patch


 #1 Show the table:
 hive describe ssga3; 
 OK
 sourcestring  
 test  float   
 dttimestamp   
 Time taken: 0.243 seconds
 #2 Run format_number on double and it works:
 hive select format_number(cast(test as double),2) from ssga3;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201403131616_0009, Tracking URL = 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009
 Kill Command = 
 /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job 
 -kill job_201403131616_0009
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers:  0
 2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0%
 2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 
 sec
 2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 
 1.47 sec
 MapReduce Total cumulative CPU time: 1 seconds 470 msec
 Ended Job = job_201403131616_0009
 MapReduce Jobs Launched: 
 Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS
 Total MapReduce CPU Time Spent: 1 seconds 470 msec
 OK
 1.00
 2.00
 Time taken: 16.563 seconds
 #3 Run format_number on float and it does not work
 hive select format_number(test,2) from ssga3; 
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_201403131616_0010, Tracking URL = 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010
 Kill Command = 
 /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job 
 -kill job_201403131616_0010
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers:  0
 2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0%
 2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100%
 Ended Job = job_201403131616_0010 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010
 Examining task ID: task_201403131616_0010_m_02 (and more) from job 
 job_201403131616_0010
 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid 
 host:port authority: logicaljt
 Task with the most failures(4):
 Task ID:
 task_201403131616_0010_m_00
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row {source:null,test:1.0,dt:null}
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
 at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row {source:null,test:1.0,dt:null}
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141)
 ..
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 MapReduce Jobs Launched: 
 Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.

2014-06-19 Thread Jayesh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayesh updated HIVE-7100:
-

Attachment: (was: HIVE-7100.1.patch)

 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.

2014-06-19 Thread Jayesh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayesh updated HIVE-7100:
-

Attachment: HIVE-7100.1.patch

 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.

2014-06-19 Thread Jayesh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayesh updated HIVE-7100:
-

Attachment: HIVE-7100.1.patch

 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.1.patch, HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7259) Transaction manager fails when Oracle used as metastore RDBMS

Alan Gates created HIVE-7259:


 Summary: Transaction manager fails when Oracle used as metastore 
RDBMS
 Key: HIVE-7259
 URL: https://issues.apache.org/jira/browse/HIVE-7259
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates


The schema specification for the transaction tables creates NUMBER(10) columns 
for longs in the Oracle schema.  However, these should be NUMBER(19).  As it is 
the JDBC calls to get the value as a long fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HIVE-7259) Transaction manager fails when Oracle used as metastore RDBMS


 [ 
https://issues.apache.org/jira/browse/HIVE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved HIVE-7259.
--

Resolution: Duplicate

 Transaction manager fails when Oracle used as metastore RDBMS
 -

 Key: HIVE-7259
 URL: https://issues.apache.org/jira/browse/HIVE-7259
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates

 The schema specification for the transaction tables creates NUMBER(10) 
 columns for longs in the Oracle schema.  However, these should be NUMBER(19). 
  As it is the JDBC calls to get the value as a long fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7256) HiveTxnManager should be stateless


 [ 
https://issues.apache.org/jira/browse/HIVE-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7256:
-

Assignee: Alan Gates  (was: Eugene Koifman)

 HiveTxnManager should be stateless
 --

 Key: HIVE-7256
 URL: https://issues.apache.org/jira/browse/HIVE-7256
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Alan Gates

 In order to integrate HCat with Hive ACID, we should be able to create an 
 instance of HiveTxnManager and use it to acquire locks, and release locks 
 from a different instance of HiveTxnManager.
 One use case where this shows up is when a job using HCat is retried, since 
 calls to TxnManager are made from the jobs OutputCommitter.
 Another, is HCatReader/Writer.  For example, TestReaderWriter, calls 
 setupJob()  from one instance of OutputCommitterContainer and commitJob() 
 from another instance.  The 2nd case is perhaps better solved by ensuring 
 there is only 1 instance of OutputCommitterContainer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test


[ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038022#comment-14038022
 ] 

Gunther Hagleitner commented on HIVE-7254:
--

I can try to factor that info into a separate file and make our build process 
work with it. What do you need on the ptest framework side? Would a property 
file work for you? Is the file that drives the test framework checked into the 
hive repo?

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho

 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test

2014-06-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038028#comment-14038028
 ] 

Gunther Hagleitner commented on HIVE-7254:
--

Alternatively if we have something like 
http://xmltwig.org/xmltwig/tools/xml_grep/xml_grep.html on the build nodes 
parsing the xml wouldn't be hard either. But that seems  more brittle than 
maintaining the tests in a separate file.

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho

 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7105) Enable ReduceRecordProcessor to generate VectorizedRowBatches

2014-06-19 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038043#comment-14038043
 ] 

Jitendra Nath Pandey commented on HIVE-7105:


+1

 Enable ReduceRecordProcessor to generate VectorizedRowBatches
 -

 Key: HIVE-7105
 URL: https://issues.apache.org/jira/browse/HIVE-7105
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Reporter: Rajesh Balamohan
Assignee: Gopal V
 Fix For: 0.14.0

 Attachments: HIVE-7105.1.patch, HIVE-7105.2.patch


 Currently, ReduceRecordProcessor sends one key,value pair at a time to its 
 operator pipeline.  It would be beneficial to send VectorizedRowBatch to 
 downstream operators. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization

2014-06-19 Thread Jitendra Nath Pandey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-7188:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks to [~hsubramaniyan]!

 sum(if()) returns wrong results with vectorization
 --

 Key: HIVE-7188
 URL: https://issues.apache.org/jira/browse/HIVE-7188
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, 
 hike-vector-sum-bug.tgz


 1. The tgz file containing the setup is attached.
 2. Run the following query
 select
 sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
 from hike_error.ttr_day0;
 returns 0 rows with vectorization turned on whereas it return 131 rows with 
 vectorization turned off.
 hive source insert.sql
  ;
 OK
 Time taken: 0.359 seconds
 OK
 Time taken: 0.015 seconds
 OK
 Time taken: 0.069 seconds
 OK
 Time taken: 0.176 seconds
 Loading data to table hike_error.ttr_day0
 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, 
 rawDataSize=0]
 OK
 Time taken: 0.33 seconds
 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
 Ended Job = job_local773704964_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 131
 Time taken: 5.325 seconds, Fetched: 1 row(s)
 hive set hive.vectorized.execution.enabled=true; 

 hive select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Execution log at: 
 /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
 Job running in-process (local Hadoop)
 Hadoop job information for null: number of mappers: 0; number of reducers: 0
 2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
 Ended Job = job_local701415676_0001
 Execution completed successfully
 MapredLocal task succeeded
 OK
 0
 Time taken: 5.52 seconds, Fetched: 1 row(s)
 hive explain select
  sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
  from hike_error.ttr_day0;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: ttr_day0
 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: is_returning (type: boolean), is_free (type: 
 boolean)
   outputColumnNames: is_returning, is_free
   Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE 
 Column stats: NONE
   Group By Operator
 aggregations: sum(if(((is_returning = true) and (is_free = 
 false)), 1, 0))
 mode: hash
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   sort order: 
   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: _col0 (type: bigint)
   Execution mode: vectorized
   Reduce Operator Tree:
 Group By Operator

[jira] [Commented] (HIVE-5155) Support secure proxy user access to HiveServer2

2014-06-19 Thread Mani Rajash (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038076#comment-14038076
 ] 

Mani Rajash commented on HIVE-5155:
---

the bug is marked as fixed are there release notes or guide to using this 
feature?

 Support secure proxy user access to HiveServer2
 ---

 Key: HIVE-5155
 URL: https://issues.apache.org/jira/browse/HIVE-5155
 Project: Hive
  Issue Type: Improvement
  Components: Authentication, HiveServer2, JDBC
Affects Versions: 0.12.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.13.0

 Attachments: HIVE-5155-1-nothrift.patch, HIVE-5155-noThrift.2.patch, 
 HIVE-5155-noThrift.4.patch, HIVE-5155-noThrift.5.patch, 
 HIVE-5155-noThrift.6.patch, HIVE-5155-noThrift.7.patch, 
 HIVE-5155-noThrift.8.patch, HIVE-5155.1.patch, HIVE-5155.2.patch, 
 HIVE-5155.3.patch, HIVE-5155.4.patch, HIVE-5155.5.patch, ProxyAuth.java, 
 ProxyAuth.out, TestKERBEROS_Hive_JDBC.java


 The HiveServer2 can authenticate a client using via Kerberos and impersonate 
 the connecting user with underlying secure hadoop. This becomes a gateway for 
 a remote client to access secure hadoop cluster. Now this works fine for when 
 the client obtains Kerberos ticket and directly connects to HiveServer2. 
 There's another big use case for middleware tools where the end user wants to 
 access Hive via another server. For example Oozie action or Hue submitting 
 queries or a BI tool server accessing to HiveServer2. In these cases, the 
 third party server doesn't have end user's Kerberos credentials and hence it 
 can't submit queries to HiveServer2 on behalf of the end user.
 This ticket is for enabling proxy access to HiveServer2 for third party tools 
 on behalf of end users. There are two parts of the solution proposed in this 
 ticket:
 1) Delegation token based connection for Oozie (OOZIE-1457)
 This is the common mechanism for Hadoop ecosystem components. Hive Remote 
 Metastore and HCatalog already support this. This is suitable for tool like 
 Oozie that submits the MR jobs as actions on behalf of its client. Oozie 
 already uses similar mechanism for Metastore/HCatalog access.
 2) Direct proxy access for privileged hadoop users
 The delegation token implementation can be a challenge for non-hadoop 
 (especially non-java) components. This second part enables a privileged user 
 to directly specify an alternate session user during the connection. If the 
 connecting user has hadoop level privilege to impersonate the requested 
 userid, then HiveServer2 will run the session as that requested user. For 
 example, user Hue is allowed to impersonate user Bob (via core-site.xml proxy 
 user configuration). Then user Hue can connect to HiveServer2 and specify Bob 
 as session user via a session property. HiveServer2 will verify Hue's proxy 
 user privilege and then impersonate user Bob instead of Hue. This will enable 
 any third party tool to impersonate alternate userid without having to 
 implement delegation token connection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7260) between operator for vectorization should support non-constant expressions as inputs

Hari Sankar Sivarama Subramaniyan created HIVE-7260:
---

 Summary: between operator for vectorization should support 
non-constant expressions as inputs
 Key: HIVE-7260
 URL: https://issues.apache.org/jira/browse/HIVE-7260
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


Follow-up jira for HIVE-7166. 
Eg query where vectorization is disabled:
select x from T where T.y between T.a and T.d;





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7166) Vectorization with UDFs returns incorrect results

2014-06-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038081#comment-14038081
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-7166:
-

Create HIVE-7260 as follow up jira.

Thanks
Hari

 Vectorization with UDFs returns incorrect results
 -

 Key: HIVE-7166
 URL: https://issues.apache.org/jira/browse/HIVE-7166
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.13.0
 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
Reporter: Benjamin Bowman
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Minor
 Attachments: HIVE-7166.1.patch, HIVE-7166.2.patch


 Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect 
 query results. 
 Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - 
 X) and UDF_1
 The following test scenario will reproduce the problem:
 TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1):  
 package com.test;
 import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import java.lang.String;
 import java.lang.*;
 public class tenThousand extends UDF {
   private final LongWritable result = new LongWritable();
   public LongWritable evaluate() {
 result.set(1);
 return result;
   }
 }
 TEST DATA (test.input):
 1|CBCABC|12
 2|DBCABC|13
 3|EBCABC|14
 4|ABCABC|15
 5|BBCABC|16
 6|CBCABC|17
 CREATING ORC TABLE:
 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, 
 second varchar(20), third int) partitioned by (range int) clustered by 
 (first) sorted by (first) into 8 buckets stored as orc tblproperties 
 (orc.compress = SNAPPY, orc.index = true);
 CREATE LOADING TABLE:
 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, 
 second varchar(20), third int) partitioned by (range int) row format 
 delimited fields terminated by '|' stored as textfile;
 COPY IN DATA:
 [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
 ORC DATA:
 [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf 
 hive.exec.dynamic.partition.mode=nonstrict --hiveconf 
 hive.enforce.sorting=true -e insert into table testTabOrc partition(range) 
 select * from loadingDir;
 LOAD TEST FUNCTION:
 0: jdbc:hive2://server:10002/db  add jar /opt/hadoop/lib/testFunction.jar
 0: jdbc:hive2://server:10002/db  create temporary function ten_thousand as 
 'com.test.tenThousand';
 TURN OFF VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=false;
 QUERY (RESULTS AS EXPECTED):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 | 1  |
 | 2  |
 | 3  |
 ++
 3 rows selected (15.286 seconds)
 TURN ON VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=true;
 QUERY AGAIN (WRONG RESULTS):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 ++
 No rows selected (17.763 seconds)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7234) Select on decimal column throws NPE


[ 
https://issues.apache.org/jira/browse/HIVE-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038085#comment-14038085
 ] 

Hive QA commented on HIVE-7234:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651316/HIVE-7234.3.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5640 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/518/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/518/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-518/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651316

 Select on decimal column throws NPE
 ---

 Key: HIVE-7234
 URL: https://issues.apache.org/jira/browse/HIVE-7234
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7234.2.patch, HIVE-7234.3.patch, HIVE-7234.patch


 Select on decimal column throws NPE for values greater than maximum 
 permissible value (99)
 Steps to repro:
 DROP TABLE IF EXISTS DECIMAL;
 CREATE TABLE DECIMAL (dec decimal);
 // Content of decimal_10_0.txt = 99.999
 LOAD DATA LOCAL INPATH '../../data/files/decimal_10_0.txt' OVERWRITE INTO 
 TABLE DECIMAL;
 SELECT dec FROM DECIMAL; = throws NPE
 DROP TABLE DECIMAL;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test

[
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038086#comment-14038086
]

Szehon Ho commented on HIVE-7254:
-

Thanks, property file would work fine for Ptest framework. The refactoring of
normal build process can be done as a first change. The PTest framework today
generates cmds like mvn test -Dtest=TestMiniTezCliDriver -Dqfile=$batch1
using its own property file, and will continue to work.

The actual Ptest framework class is checked into the hive repo at:
[TestParser.java|https://github.com/apache/hive/blob/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestParser.java].
It runs against the Ptest conf file trunk-mr2.properties (attached). As a
second step, we can replace the hard-coded test names there with references to
the new property file, and have TestParser read those.

I can take a look at this, but if anyone is interested, build instructions for
Ptest framework are at :
[README|https://github.com/apache/hive/blob/trunk/testutils/ptest2/README.md].
In development, I usually run TestParser locally against the file to verify the
generated-batches, for example see
[TestTestParser.java|https://github.com/apache/hive/blob/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/conf/TestTestParser.java].

Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
---

Key: HIVE-7254
URL: https://issues.apache.org/jira/browse/HIVE-7254
Project: Hive
Issue Type: Test
Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho
Attachments: trunk-mr2.properties

Today, the Hive PTest infrastructure has a test-driver configuration called
directory, so it will run all the qfiles under that directory for that
driver. For example, CLIDriver is configured with directory
ql/src/test/queries/clientpositive
However the configuration for the miniXXXDrivers (miniMRDriver,
miniMRDriverNegative, miniTezDriver) run only a select number of tests under
directory. So we have to use the include configuration to hard-code a
list of tests for it to run. This is duplicating the list of each
miniDriver's tests already in the /itests/qtest pom file, and can get out of
date.
It would be nice if both got their information the same way.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7254) Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test


 [ 
https://issues.apache.org/jira/browse/HIVE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7254:


Attachment: trunk-mr2.properties

 Enhance Ptest framework config to auto-pick up list of MiniXXXDriver's test
 ---

 Key: HIVE-7254
 URL: https://issues.apache.org/jira/browse/HIVE-7254
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: trunk-mr2.properties


 Today, the Hive PTest infrastructure has a test-driver configuration called 
 directory, so it will run all the qfiles under that directory for that 
 driver.  For example, CLIDriver is configured with directory 
 ql/src/test/queries/clientpositive
 However the configuration for the miniXXXDrivers (miniMRDriver, 
 miniMRDriverNegative, miniTezDriver) run only a select number of tests under 
 directory.  So we have to use the include configuration to hard-code a 
 list of tests for it to run.  This is duplicating the list of each 
 miniDriver's tests already in the /itests/qtest pom file, and can get out of 
 date.
 It would be nice if both got their information the same way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7249) HiveTxnManager.closeTxnManger() throws if called after commitTxn()


[ 
https://issues.apache.org/jira/browse/HIVE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038092#comment-14038092
 ] 

Eugene Koifman commented on HIVE-7249:
--

[~alangates] org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned 
gets wedged with this patch

 HiveTxnManager.closeTxnManger() throws if called after commitTxn()
 --

 Key: HIVE-7249
 URL: https://issues.apache.org/jira/browse/HIVE-7249
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Alan Gates
 Attachments: HIVE-7249.patch


  I openTxn() and acquireLocks() for a query that looks like INSERT INTO T 
 PARTITION(p) SELECT * FROM T.
 Then I call commitTxn().  Then I call closeTxnManger() I get an exception 
 saying lock not found (the only lock in this txn).  So it seems TxnMgr 
 doesn't know that commit released the locks.
 Here is the stack trace and some log output which maybe useful:
 {noformat}
 2014-06-17 15:54:40,771 DEBUG mapreduce.TransactionContext 
 (TransactionContext.java:onCommitJob(128)) - 
 onCommitJob(job_local557130041_0001). this=46719652
 2014-06-17 15:54:40,771 DEBUG lockmgr.DbTxnManager 
 (DbTxnManager.java:commitTxn(205)) - Committing txn 1
 2014-06-17 15:54:40,771 DEBUG txn.TxnHandler (TxnHandler.java:getDbTime(872)) 
 - Going to execute query values current_timestamp
 2014-06-17 15:54:40,772 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatTxn(1423)) - Going to execute query select 
 txn_state from TXNS where txn_id = 1 for\
  update
 2014-06-17 15:54:40,773 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatTxn(1438)) - Going to execute update update TXNS 
 set txn_last_heartbeat = 140304568\
 0772 where txn_id = 1
 2014-06-17 15:54:40,778 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatTxn(1440)) - Going to commit
 2014-06-17 15:54:40,779 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(344)) 
 - Going to execute insert insert into COMPLETED_TXN_COMPONENTS select tc_txn\
 id, tc_database, tc_table, tc_partition from TXN_COMPONENTS where tc_txnid = 
 1
 2014-06-17 15:54:40,784 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(352)) 
 - Going to execute update delete from TXN_COMPONENTS where tc_txnid = 1
 2014-06-17 15:54:40,788 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(356)) 
 - Going to execute update delete from HIVE_LOCKS where hl_txnid = 1
 2014-06-17 15:54:40,791 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(359)) 
 - Going to execute update delete from TXNS where txn_id = 1
 2014-06-17 15:54:40,794 DEBUG txn.TxnHandler (TxnHandler.java:commitTxn(361)) 
 - Going to commit
 2014-06-17 15:54:40,795 WARN  mapreduce.TransactionContext 
 (TransactionContext.java:cleanup(317)) - 
 cleanupJob(JobID=job_local557130041_0001)this=46719652
 2014-06-17 15:54:40,795 DEBUG lockmgr.DbLockManager 
 (DbLockManager.java:unlock(109)) - Unlocking id:1
 2014-06-17 15:54:40,796 DEBUG txn.TxnHandler (TxnHandler.java:getDbTime(872)) 
 - Going to execute query values current_timestamp
 2014-06-17 15:54:40,796 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatLock(1402)) - Going to execute update update 
 HIVE_LOCKS set hl_last_heartbeat = 140\
 3045680796 where hl_lock_ext_id = 1
 2014-06-17 15:54:40,800 DEBUG txn.TxnHandler 
 (TxnHandler.java:heartbeatLock(1405)) - Going to rollback
 2014-06-17 15:54:40,804 ERROR metastore.RetryingHMSHandler 
 (RetryingHMSHandler.java:invoke(143)) - NoSuchLockException(message:No such 
 lock: 1)
 at 
 org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeatLock(TxnHandler.java:1407)
 at 
 org.apache.hadoop.hive.metastore.txn.TxnHandler.unlock(TxnHandler.java:477)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.unlock(HiveMetaStore.java:4817)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
 at com.sun.proxy.$Proxy14.unlock(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.unlock(HiveMetaStoreClient.java:1598)
 at 
 org.apache.hadoop.hive.ql.lockmgr.DbLockManager.unlock(DbLockManager.java:110)
 at 
 org.apache.hadoop.hive.ql.lockmgr.DbLockManager.close(DbLockManager.java:162)
 at 
 org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.destruct(DbTxnManager.java:300)
 at 
 org.apache.hadoop.hive.ql.lockmgr.HiveTxnManagerImpl.closeTxnManager(HiveTxnManagerImpl.java:39)
 at

[jira] [Commented] (HIVE-7234) Select on decimal column throws NPE


[ 
https://issues.apache.org/jira/browse/HIVE-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038111#comment-14038111
 ] 

Ashish Kumar Singh commented on HIVE-7234:
--

The test errors do not look related to the patch.

 Select on decimal column throws NPE
 ---

 Key: HIVE-7234
 URL: https://issues.apache.org/jira/browse/HIVE-7234
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7234.2.patch, HIVE-7234.3.patch, HIVE-7234.patch


 Select on decimal column throws NPE for values greater than maximum 
 permissible value (99)
 Steps to repro:
 DROP TABLE IF EXISTS DECIMAL;
 CREATE TABLE DECIMAL (dec decimal);
 // Content of decimal_10_0.txt = 99.999
 LOAD DATA LOCAL INPATH '../../data/files/decimal_10_0.txt' OVERWRITE INTO 
 TABLE DECIMAL;
 SELECT dec FROM DECIMAL; = throws NPE
 DROP TABLE DECIMAL;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7249) HiveTxnManager.closeTxnManger() throws if called after commitTxn()


[ 
https://issues.apache.org/jira/browse/HIVE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038119#comment-14038119
 ] 

Eugene Koifman commented on HIVE-7249:
--

Here is the thread dump though there doesn't appear to be anything interesting 
in it
{noformat}
Picked up JAVA_TOOL_OPTIONS:  -Djava.awt.headless=true 
-Dapple.awt.UIElement=true
57554 
87066 
/Users/ekoifman/dev/hive/hcatalog/core/target/surefire/surefirebooter3727332902234772866.jar
87243 sun.tools.jps.Jps
87056 org.codehaus.plexus.classworlds.launcher.Launcher
ekoifman:hcatalog ekoifman$ jstack 87066
Picked up JAVA_TOOL_OPTIONS:  -Djava.awt.headless=true 
-Dapple.awt.UIElement=true
2014-06-19 16:38:27
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.51-b01-457 mixed mode):

Attach Listener daemon prio=9 tid=7ffded8c7800 nid=0x10c84 waiting on 
condition []
   java.lang.Thread.State: RUNNABLE

BoneCP-pool-watch-thread daemon prio=5 tid=7ffde9e89000 nid=0x10defb000 
waiting on condition [10defa000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  7b8e93d10 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at 
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:322)
at com.jolbox.bonecp.PoolWatchThread.run(PoolWatchThread.java:75)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)

BoneCP-keep-alive-scheduler daemon prio=5 tid=7ffde9e88000 nid=0x10ddf8000 
waiting on condition [10ddf7000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  7b8fde4d8 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:957)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:917)
at java.lang.Thread.run(Thread.java:680)

com.google.common.base.internal.Finalizer daemon prio=5 tid=7ffde9e9a000 
nid=0x10dcf5000 in Object.wait() [10dcf4000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 7b906a3a8 (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
- locked 7b906a3a8 (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
at com.google.common.base.internal.Finalizer.run(Finalizer.java:127)

BoneCP-pool-watch-thread daemon prio=5 tid=7ffde91c6800 nid=0x10d068000 
waiting on condition [10d067000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  7b870b118 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at 
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:322)
at com.jolbox.bonecp.PoolWatchThread.run(PoolWatchThread.java:75)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:680)

BoneCP-keep-alive-scheduler daemon prio=5 tid=7ffdec031800 nid=0x10cf65000 
waiting on condition [10cf64000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  7b86fd7c0 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
at

[jira] [Updated] (HIVE-6207) Integrate HCatalog with locking


 [ 
https://issues.apache.org/jira/browse/HIVE-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-6207:
-

Attachment: HIVE-6207.4.patch

preliminary patch

 Integrate HCatalog with locking
 ---

 Key: HIVE-6207
 URL: https://issues.apache.org/jira/browse/HIVE-6207
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-6207.4.patch


 HCatalog currently ignores any locks created by Hive users.  It should 
 respect the locks Hive creates as well as create locks itself when locking is 
 configured.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22772: HIVE-6637: UDF in_file() doesn't take CHAR or VARCHAR as input

2014-06-19 Thread Jason Dere


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22772/#review46246
---



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInFile.java
https://reviews.apache.org/r/22772/#comment81557

Rather than having to compare the actual class/class name of the type, use 
call PrimitiveObjectInpsector.getPrimitiveCategory() which returns an enum 
corresponding to the type.  Take a look at GenericUDFPrintf, where Xuefu made 
similar changes to the printf() function to support char/varchar.


- Jason Dere


On June 19, 2014, 6:55 p.m., Ashish Singh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22772/
 ---
 
 (Updated June 19, 2014, 6:55 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-6637
 https://issues.apache.org/jira/browse/HIVE-6637
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-6637: UDF in_file() doesn't take CHAR or VARCHAR as input
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInFile.java 
 ea52537d0b85191f0b633a29aa3f7ddb556c288d 
   ql/src/test/queries/clientpositive/udf_in_file.q 
 9d9efe8e23d6e73429ee5cd2c8470359ba2b3498 
   ql/src/test/results/clientpositive/udf_in_file.q.out 
 b63143760d80f3f6a8ba0a23c0d87e8bb86fce66 
 
 Diff: https://reviews.apache.org/r/22772/diff/
 
 
 Testing
 ---
 
 Tested with qtest.
 
 
 Thanks,
 
 Ashish Singh

[jira] [Commented] (HIVE-7063) Optimize for the Top N within a Group use case


[ 
https://issues.apache.org/jira/browse/HIVE-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038234#comment-14038234
 ] 

Hive QA commented on HIVE-7063:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651239/HIVE-7063.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5656 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_implicit_cast1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/519/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/519/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-519/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651239

 Optimize for the Top N within a Group use case
 --

 Key: HIVE-7063
 URL: https://issues.apache.org/jira/browse/HIVE-7063
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7063.1.patch, HIVE-7063.2.patch


 It is common to rank within a Group/Partition and then only return the Top N 
 entries within each Group.
 With Streaming mode for Windowing, we should push the post filter on the rank 
 into the Windowing processing as a Limit expression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7231) Improve ORC padding


 [ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7231:
-

Attachment: HIVE-7231.3.patch

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC


 [ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7250:
-

Fix Version/s: 0.14.0

 Adaptive compression buffer size for wide tables in ORC
 ---

 Key: HIVE-7250
 URL: https://issues.apache.org/jira/browse/HIVE-7250
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.14.0

 Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, 
 HIVE-7250.4.patch, HIVE-7250.5.patch


 If the input table is wide (in the order of 1000s), ORC compression buffer 
 size overhead becomes significant causing OOM issues. To overcome this issue, 
 buffer size should be adaptively chosen based on the available memory and the 
 number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC


[ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038286#comment-14038286
 ] 

Prasanth J commented on HIVE-7250:
--

Patch committed to trunk.

 Adaptive compression buffer size for wide tables in ORC
 ---

 Key: HIVE-7250
 URL: https://issues.apache.org/jira/browse/HIVE-7250
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.14.0

 Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, 
 HIVE-7250.4.patch, HIVE-7250.5.patch


 If the input table is wide (in the order of 1000s), ORC compression buffer 
 size overhead becomes significant causing OOM issues. To overcome this issue, 
 buffer size should be adaptively chosen based on the available memory and the 
 number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7250) Adaptive compression buffer size for wide tables in ORC


[ 
https://issues.apache.org/jira/browse/HIVE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038284#comment-14038284
 ] 

Prasanth J commented on HIVE-7250:
--

The recent patch does not change the outcome of unit tests.


 Adaptive compression buffer size for wide tables in ORC
 ---

 Key: HIVE-7250
 URL: https://issues.apache.org/jira/browse/HIVE-7250
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.14.0

 Attachments: HIVE-7250.1.patch, HIVE-7250.2.patch, HIVE-7250.3.patch, 
 HIVE-7250.4.patch, HIVE-7250.5.patch


 If the input table is wide (in the order of 1000s), ORC compression buffer 
 size overhead becomes significant causing OOM issues. To overcome this issue, 
 buffer size should be adaptively chosen based on the available memory and the 
 number of columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde

2014-06-19 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-6394:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you for the contribution! I have committed this to trunk.

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Fix For: 0.14.0

 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.6.patch, HIVE-6394.7.patch, 
 HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7262) Partitioned Table Function (PTF) query fails on ORC table when attempting to vectorize

2014-06-19 Thread Matt McCline (JIRA)

Matt McCline created HIVE-7262:
--

 Summary: Partitioned Table Function (PTF) query fails on ORC table 
when attempting to vectorize
 Key: HIVE-7262
 URL: https://issues.apache.org/jira/browse/HIVE-7262
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline



In ptf.q, create the part table with STORED AS ORC and SET 
hive.vectorized.execution.enabled=true;

Queries fail to find BLOCKOFFSET virtual column during vectorization and 
suffers an exception.

ERROR vector.VectorizationContext 
(VectorizationContext.java:getInputColumnIndex(186)) - The column 
BLOCK__OFFSET__INSIDE__FILE is not in the vectorization context column map.

Jitendra pointed to the routine that returns the VectorizationContext in 
Vectorize.java needing to add virtual columns to the map, too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions

2014-06-19 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038405#comment-14038405
 ] 

Brock Noland commented on HIVE-7230:


[~davidzchen] thank you very much for the contribution! This LGTM... I think we 
should change to things before commit:

1) add apache license to the formatter file
2) move it into a new dir dev-support

 Add Eclipse formatter file for Hive coding conventions
 --

 Key: HIVE-7230
 URL: https://issues.apache.org/jira/browse/HIVE-7230
 Project: Hive
  Issue Type: Improvement
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7230.1.patch, HIVE-7230.2.patch


 Eclipse's formatter is a convenient way to clean up formatting for Java code. 
 Currently, there is no Eclipse formatter file checked into Hive's codebase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7029) Vectorize ReduceWork


[ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038450#comment-14038450
 ] 

Hive QA commented on HIVE-7029:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12651238/HIVE-7029.2.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 5639 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_ppr
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/520/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/520/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-520/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12651238

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7234) Select on decimal column throws NPE


[ 
https://issues.apache.org/jira/browse/HIVE-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038456#comment-14038456
 ] 

Ashish Kumar Singh commented on HIVE-7234:
--

Thanks [~xuefuz] and [~swarnim] for reviewing.

 Select on decimal column throws NPE
 ---

 Key: HIVE-7234
 URL: https://issues.apache.org/jira/browse/HIVE-7234
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7234.2.patch, HIVE-7234.3.patch, HIVE-7234.patch


 Select on decimal column throws NPE for values greater than maximum 
 permissible value (99)
 Steps to repro:
 DROP TABLE IF EXISTS DECIMAL;
 CREATE TABLE DECIMAL (dec decimal);
 // Content of decimal_10_0.txt = 99.999
 LOAD DATA LOCAL INPATH '../../data/files/decimal_10_0.txt' OVERWRITE INTO 
 TABLE DECIMAL;
 SELECT dec FROM DECIMAL; = throws NPE
 DROP TABLE DECIMAL;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions

2014-06-19 Thread David Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-7230:
-

Attachment: HIVE-7230.3.patch

 Add Eclipse formatter file for Hive coding conventions
 --

 Key: HIVE-7230
 URL: https://issues.apache.org/jira/browse/HIVE-7230
 Project: Hive
  Issue Type: Improvement
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7230.1.patch, HIVE-7230.2.patch, HIVE-7230.3.patch


 Eclipse's formatter is a convenient way to clean up formatting for Java code. 
 Currently, there is no Eclipse formatter file checked into Hive's codebase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions

2014-06-19 Thread David Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038472#comment-14038472
 ] 

David Chen commented on HIVE-7230:
--

Thanks, [~brocknoland]! I have added the license header to the formatter file 
and moved into a new dev-support directory.

I have attached a new patch and updated RB.

 Add Eclipse formatter file for Hive coding conventions
 --

 Key: HIVE-7230
 URL: https://issues.apache.org/jira/browse/HIVE-7230
 Project: Hive
  Issue Type: Improvement
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7230.1.patch, HIVE-7230.2.patch, HIVE-7230.3.patch


 Eclipse's formatter is a convenient way to clean up formatting for Java code. 
 Currently, there is no Eclipse formatter file checked into Hive's codebase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

branch for cbo work

2014-06-19 Thread Ashutosh Chauhan

Hi all,

Some of you may have noticed that cost based optimizer work is going on at
HIVE-5775 John has put up an initial patch there as well. But there is a
lot more work that needs to be done. Following our tradition of large
feature work in branch, I propose that we create a branch and commit this
patch in it and than continue to work on it in branch to improve it.
Hopefully, we can get it in shape so that we can merge it in trunk once its
ready.  Unless, I hear otherwise I plan to create a branch and commit this
initial patch by early next week.


Design doc is located here :
https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive


Thanks,

Ashutosh

[jira] [Created] (HIVE-7263) Missing fixes from review of parquet-timestamp

Szehon Ho created HIVE-7263:
---

 Summary: Missing fixes from review of parquet-timestamp
 Key: HIVE-7263
 URL: https://issues.apache.org/jira/browse/HIVE-7263
 Project: Hive
  Issue Type: Bug
Reporter: Szehon Ho
Assignee: Szehon Ho






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7251) Fix StorageDescriptor usage in unit tests