[jira] [Created] (HIVE-6020) Support UUIDs and versioning for DBs/Tables/Partitions/Columns

2013-12-12 Thread Carl Steinbach (JIRA)
Carl Steinbach created HIVE-6020:


 Summary: Support UUIDs and versioning for 
DBs/Tables/Partitions/Columns
 Key: HIVE-6020
 URL: https://issues.apache.org/jira/browse/HIVE-6020
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema, Metastore
Reporter: Carl Steinbach
Assignee: Carl Steinbach






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6018) FetchTask should not reference metastore classes

2013-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846152#comment-13846152
 ] 

Hive QA commented on HIVE-6018:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618358/HIVE-6018.1.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4763 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/621/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/621/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618358

 FetchTask should not reference metastore classes
 

 Key: HIVE-6018
 URL: https://issues.apache.org/jira/browse/HIVE-6018
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-6018.1.patch.txt


 The below code parts in PartitionDesc throws NoClassDefFounError sometimes  
 in execution.
 {noformat}
 public Deserializer getDeserializer() {
 try {
   return MetaStoreUtils.getDeserializer(Hive.get().getConf(), 
 getProperties());
 } catch (Exception e) {
   return null;
 }
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification

2013-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846191#comment-13846191
 ] 

Hive QA commented on HIVE-1466:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618376/HIVE-1466.1.patch

{color:green}SUCCESS:{color} +1 4765 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/622/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/622/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618376

 Add NULL DEFINED AS to ROW FORMAT specification
 ---

 Key: HIVE-1466
 URL: https://issues.apache.org/jira/browse/HIVE-1466
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer
Assignee: Prasad Mujumdar
 Attachments: HIVE-1466.1.patch


 NULL values are passed to transformers as a literal backslash and a literal 
 N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. 
 This is inconsistent.
 The ROW FORMAT specification of tables should be able to specify the manner 
 in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or 
 '\003' or whatever should apply to all instances of table export and saving.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5973) SMB joins produce incorrect results with multiple partitions and buckets

2013-12-12 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5973:
-

Attachment: HIVE-5973.1.patch

Attached is the test and a fix. The problem occurs when the small table is 
bucketed and partitioned and has a select sub-query. The select operator that 
is introduced as part of the sub-query causes the issue described.

Thanks to [~rhbutani] for helping with the solution and test case. It looks 
like the right way to run these type of tests is via the MinimrCliDriver as the 
CliDriver tests mask the issue by having a single reducer resulting in 
incorrect bucketing.

 SMB joins produce incorrect results with multiple partitions and buckets
 

 Key: HIVE-5973
 URL: https://issues.apache.org/jira/browse/HIVE-5973
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.13.0

 Attachments: HIVE-5973.1.patch


 It looks like there is an issue with re-using the output object array in the 
 select operator. When we read rows of the non-big tables, we hold on to the 
 output object in the priority queue. This causes hive to produce incorrect 
 results because all the elements in the priority queue refer to the same 
 object and the join happens on only one of the buckets.
 {noformat}
 output[i] = eval[i].evaluate(row);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5973) SMB joins produce incorrect results with multiple partitions and buckets

2013-12-12 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5973:
-

Status: Patch Available  (was: Open)

 SMB joins produce incorrect results with multiple partitions and buckets
 

 Key: HIVE-5973
 URL: https://issues.apache.org/jira/browse/HIVE-5973
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.13.0

 Attachments: HIVE-5973.1.patch


 It looks like there is an issue with re-using the output object array in the 
 select operator. When we read rows of the non-big tables, we hold on to the 
 output object in the priority queue. This causes hive to produce incorrect 
 results because all the elements in the priority queue refer to the same 
 object and the join happens on only one of the buckets.
 {noformat}
 output[i] = eval[i].evaluate(row);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Review Request 16213: HIVE-5973: SMB joins produce incorrect results with multiple partitions and buckets

2013-12-12 Thread Vikram Dixit Kumaraswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16213/
---

Review request for hive, Navis Ryu and Harish Butani.


Bugs: HIVE-5973
https://issues.apache.org/jira/browse/HIVE-5973


Repository: hive-git


Description
---

SMB joins produce incorrect results with multiple partitions and buckets


Diffs
-

  itests/hive-unit/pom.xml dae4e50 
  itests/qtest/pom.xml 8c249a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DummyStoreOperator.java acdb040 
  ql/src/test/queries/clientpositive/auto_sortmerge_join_16.q PRE-CREATION 
  ql/src/test/results/clientpositive/auto_sortmerge_join_16.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/16213/diff/


Testing
---

New test case added.


Thanks,

Vikram Dixit Kumaraswamy



[jira] [Commented] (HIVE-5973) SMB joins produce incorrect results with multiple partitions and buckets

2013-12-12 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846211#comment-13846211
 ] 

Vikram Dixit K commented on HIVE-5973:
--

https://reviews.apache.org/r/16213/

 SMB joins produce incorrect results with multiple partitions and buckets
 

 Key: HIVE-5973
 URL: https://issues.apache.org/jira/browse/HIVE-5973
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.13.0

 Attachments: HIVE-5973.1.patch


 It looks like there is an issue with re-using the output object array in the 
 select operator. When we read rows of the non-big tables, we hold on to the 
 output object in the priority queue. This causes hive to produce incorrect 
 results because all the elements in the priority queue refer to the same 
 object and the join happens on only one of the buckets.
 {noformat}
 output[i] = eval[i].evaluate(row);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5973) SMB joins produce incorrect results with multiple partitions and buckets

2013-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846235#comment-13846235
 ] 

Hive QA commented on HIVE-5973:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618384/HIVE-5973.1.patch

{color:green}SUCCESS:{color} +1 4763 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/623/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/623/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618384

 SMB joins produce incorrect results with multiple partitions and buckets
 

 Key: HIVE-5973
 URL: https://issues.apache.org/jira/browse/HIVE-5973
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.13.0

 Attachments: HIVE-5973.1.patch


 It looks like there is an issue with re-using the output object array in the 
 select operator. When we read rows of the non-big tables, we hold on to the 
 output object in the priority queue. This causes hive to produce incorrect 
 results because all the elements in the priority queue refer to the same 
 object and the join happens on only one of the buckets.
 {noformat}
 output[i] = eval[i].evaluate(row);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HIVE-6021) Problem in GroupByOperator for handling distinct aggrgations

2013-12-12 Thread Sun Rui (JIRA)
Sun Rui created HIVE-6021:
-

 Summary: Problem in GroupByOperator for handling distinct 
aggrgations
 Key: HIVE-6021
 URL: https://issues.apache.org/jira/browse/HIVE-6021
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Sun Rui
Assignee: Sun Rui


Use the following test case with HIVE 0.12:

{code:sql}
create table src(key int, value string);
load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
set hive.map.aggr=false; 
select count(key),count(distinct value) from src group by key;
{code}

We will get an ArrayIndexOutOfBoundsException from GroupByOperator:
{code}
java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 5 more
Caused by: java.lang.RuntimeException: Reduce operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:159)
... 10 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:281)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:152)
... 10 more
{code}

explain select count(key),count(distinct value) from src group by key;
{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Alias - Map Operator Tree:
src 
  TableScan
alias: src
Select Operator
  expressions:
expr: key
type: int
expr: value
type: string
  outputColumnNames: key, value
  Reduce Output Operator
key expressions:
  expr: key
  type: int
  expr: value
  type: string
sort order: ++
Map-reduce partition columns:
  expr: key
  type: int
tag: -1
  Reduce Operator Tree:
Group By Operator
  aggregations:
expr: count(KEY._col0)   // The parameter causes this problem
   ^^^
expr: count(DISTINCT KEY._col1:0._col0)
  bucketGroup: false
  keys:
expr: KEY._col0
type: int
  mode: complete
  outputColumnNames: _col0, _col1, _col2
  Select Operator
expressions:
  expr: _col1
  type: bigint
  expr: _col2
  type: bigint
outputColumnNames: _col0, _col1
File Output Operator
  compressed: false
  GlobalTableId: 0
  table:
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

  Stage: Stage-0
Fetch Operator
  limit: -1
{code}

The root cause is within GroupByOperator.initializeOp(). The method forgets to 
handle the case:
For a query has distinct aggregations, there is an aggregation function has a 
parameter which is a groupby key column but not distinct key column.

{code}
if (unionExprEval != null) {
  String[] names = parameters.get(j).getExprString().split(\\.);
  // parameters of the form : KEY.colx:t.coly
  if (Utilities.ReduceField.KEY.name().equals(names[0])) {
String name = names[names.length - 2];
int tag = Integer.parseInt(name.split(\\:)[1]);

...

  } else {
// will be VALUE._COLx
if (!nonDistinctAggrs.contains(i)) {
  nonDistinctAggrs.add(i);
}
  }
{code}




--
This message was sent by 

[jira] [Updated] (HIVE-6021) Problem in GroupByOperator for handling distinct aggrgations

2013-12-12 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui updated HIVE-6021:
--

Attachment: HIVE-6021.1.patch

 Problem in GroupByOperator for handling distinct aggrgations
 

 Key: HIVE-6021
 URL: https://issues.apache.org/jira/browse/HIVE-6021
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Sun Rui
Assignee: Sun Rui
 Attachments: HIVE-6021.1.patch


 Use the following test case with HIVE 0.12:
 {code:sql}
 create table src(key int, value string);
 load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
 set hive.map.aggr=false; 
 select count(key),count(distinct value) from src group by key;
 {code}
 We will get an ArrayIndexOutOfBoundsException from GroupByOperator:
 {code}
 java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 5 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:159)
   ... 10 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:281)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:152)
   ... 10 more
 {code}
 explain select count(key),count(distinct value) from src group by key;
 {code}
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: int
 expr: value
 type: string
   outputColumnNames: key, value
   Reduce Output Operator
 key expressions:
   expr: key
   type: int
   expr: value
   type: string
 sort order: ++
 Map-reduce partition columns:
   expr: key
   type: int
 tag: -1
   Reduce Operator Tree:
 Group By Operator
   aggregations:
 expr: count(KEY._col0)   // The parameter causes this problem
^^^
 expr: count(DISTINCT KEY._col1:0._col0)
   bucketGroup: false
   keys:
 expr: KEY._col0
 type: int
   mode: complete
   outputColumnNames: _col0, _col1, _col2
   Select Operator
 expressions:
   expr: _col1
   type: bigint
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {code}
 The root cause is within GroupByOperator.initializeOp(). The method forgets 
 to handle the case:
 For a query has distinct aggregations, there is an aggregation function has a 
 parameter which is a groupby key column but not distinct key column.
 {code}
 if (unionExprEval != null) {
   String[] names = parameters.get(j).getExprString().split(\\.);
   // parameters of the form : KEY.colx:t.coly
   if (Utilities.ReduceField.KEY.name().equals(names[0])) {
 String name = names[names.length - 2];
   

[jira] [Created] (HIVE-6022) Load statements with incorrect order of partitions put input files to unreadable places

2013-12-12 Thread Teruyoshi Zenmyo (JIRA)
Teruyoshi Zenmyo created HIVE-6022:
--

 Summary: Load statements with incorrect order of partitions put 
input files to unreadable places
 Key: HIVE-6022
 URL: https://issues.apache.org/jira/browse/HIVE-6022
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Teruyoshi Zenmyo


Load statements with incorrect order of partitions put input files to incorrect 
paths. 

{code}
CREATE TABLE test_parts (c1 string, c2 int) PARTITIONED BY (p1 string,p2 
string);
LOAD DATA LOCAL INPATH '/opt/hive/examples/files/kv1.txt' OVERWRITE INTO TABLE 
test_parts PARTITION (p2='p1', p1='p2')
{code}

The input file is located as below and the data is not readable.
{code}
% find /user/hive/warehouse/test_parts/
/user/hive/warehouse/test_parts/
/user/hive/warehouse/test_parts//p1=p2
/user/hive/warehouse/test_parts//p1=p2/p2=p1
/user/hive/warehouse/test_parts//p2=p1
/user/hive/warehouse/test_parts//p2=p1/p1=p2
/user/hive/warehouse/test_parts//p2=p1/p1=p2/.kv1.txt.crc
/user/hive/warehouse/test_parts//p2=p1/p1=p2/kv1.txt
{code}





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6022) Load statements with incorrect order of partitions put input files to unreadable places

2013-12-12 Thread Teruyoshi Zenmyo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teruyoshi Zenmyo updated HIVE-6022:
---

Attachment: HIVE-6022.1.patch.txt

 Load statements with incorrect order of partitions put input files to 
 unreadable places
 ---

 Key: HIVE-6022
 URL: https://issues.apache.org/jira/browse/HIVE-6022
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Teruyoshi Zenmyo
 Attachments: HIVE-6022.1.patch.txt


 Load statements with incorrect order of partitions put input files to 
 incorrect paths. 
 {code}
 CREATE TABLE test_parts (c1 string, c2 int) PARTITIONED BY (p1 string,p2 
 string);
 LOAD DATA LOCAL INPATH '/opt/hive/examples/files/kv1.txt' OVERWRITE INTO 
 TABLE test_parts PARTITION (p2='p1', p1='p2')
 {code}
 The input file is located as below and the data is not readable.
 {code}
 % find /user/hive/warehouse/test_parts/
 /user/hive/warehouse/test_parts/
 /user/hive/warehouse/test_parts//p1=p2
 /user/hive/warehouse/test_parts//p1=p2/p2=p1
 /user/hive/warehouse/test_parts//p2=p1
 /user/hive/warehouse/test_parts//p2=p1/p1=p2
 /user/hive/warehouse/test_parts//p2=p1/p1=p2/.kv1.txt.crc
 /user/hive/warehouse/test_parts//p2=p1/p1=p2/kv1.txt
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6022) Load statements with incorrect order of partitions put input files to unreadable places

2013-12-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846397#comment-13846397
 ] 

Xuefu Zhang commented on HIVE-6022:
---

[~tzenmyo] Thanks for your contribution. Could you please put a review board 
entry here?

 Load statements with incorrect order of partitions put input files to 
 unreadable places
 ---

 Key: HIVE-6022
 URL: https://issues.apache.org/jira/browse/HIVE-6022
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Teruyoshi Zenmyo
 Attachments: HIVE-6022.1.patch.txt


 Load statements with incorrect order of partitions put input files to 
 incorrect paths. 
 {code}
 CREATE TABLE test_parts (c1 string, c2 int) PARTITIONED BY (p1 string,p2 
 string);
 LOAD DATA LOCAL INPATH '/opt/hive/examples/files/kv1.txt' OVERWRITE INTO 
 TABLE test_parts PARTITION (p2='p1', p1='p2')
 {code}
 The input file is located as below and the data is not readable.
 {code}
 % find /user/hive/warehouse/test_parts/
 /user/hive/warehouse/test_parts/
 /user/hive/warehouse/test_parts//p1=p2
 /user/hive/warehouse/test_parts//p1=p2/p2=p1
 /user/hive/warehouse/test_parts//p2=p1
 /user/hive/warehouse/test_parts//p2=p1/p1=p2
 /user/hive/warehouse/test_parts//p2=p1/p1=p2/.kv1.txt.crc
 /user/hive/warehouse/test_parts//p2=p1/p1=p2/kv1.txt
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6021) Problem in GroupByOperator for handling distinct aggrgations

2013-12-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846400#comment-13846400
 ] 

Xuefu Zhang commented on HIVE-6021:
---

[~sunrui] Thanks for your contribution. Do you mind providing the following?

1. A test case similar to what you constructed to produce the problem?
2. A review board entry.

 Problem in GroupByOperator for handling distinct aggrgations
 

 Key: HIVE-6021
 URL: https://issues.apache.org/jira/browse/HIVE-6021
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Sun Rui
Assignee: Sun Rui
 Attachments: HIVE-6021.1.patch


 Use the following test case with HIVE 0.12:
 {code:sql}
 create table src(key int, value string);
 load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
 set hive.map.aggr=false; 
 select count(key),count(distinct value) from src group by key;
 {code}
 We will get an ArrayIndexOutOfBoundsException from GroupByOperator:
 {code}
 java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 5 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:159)
   ... 10 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:281)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:152)
   ... 10 more
 {code}
 explain select count(key),count(distinct value) from src group by key;
 {code}
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: int
 expr: value
 type: string
   outputColumnNames: key, value
   Reduce Output Operator
 key expressions:
   expr: key
   type: int
   expr: value
   type: string
 sort order: ++
 Map-reduce partition columns:
   expr: key
   type: int
 tag: -1
   Reduce Operator Tree:
 Group By Operator
   aggregations:
 expr: count(KEY._col0)   // The parameter causes this problem
^^^
 expr: count(DISTINCT KEY._col1:0._col0)
   bucketGroup: false
   keys:
 expr: KEY._col0
 type: int
   mode: complete
   outputColumnNames: _col0, _col1, _col2
   Select Operator
 expressions:
   expr: _col1
   type: bigint
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {code}
 The root cause is within GroupByOperator.initializeOp(). The method forgets 
 to handle the case:
 For a query has distinct aggregations, there is an aggregation function has a 
 parameter which is a groupby key column but not distinct key column.
 {code}
 if (unionExprEval != null) {
   String[] names = 

[jira] [Commented] (HIVE-5824) Support generation of html test reports in maven

2013-12-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846414#comment-13846414
 ] 

Ashutosh Chauhan commented on HIVE-5824:


I am able to generate test-report even with out this patch using commands 
above. Seems patch is no longer required. [~prasanth_j] Can you close this out 
if thats indeed the case?

 Support generation of html test reports in maven
 

 Key: HIVE-5824
 URL: https://issues.apache.org/jira/browse/HIVE-5824
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
  Labels: build, maven, test
 Attachments: HIVE-5824.2.patch.txt, HIVE-5824.patch.txt


 {code}ant testreport{code} generated output of test results in html format. 
 It is useful to support the same in maven. The default test report generated 
 by maven is in XML format which is hard to read.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HIVE-6023) Numeric Data Type Support

2013-12-12 Thread Deepak Raj (JIRA)
Deepak Raj created HIVE-6023:


 Summary: Numeric Data Type Support
 Key: HIVE-6023
 URL: https://issues.apache.org/jira/browse/HIVE-6023
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema, File Formats
Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.9.0
 Environment: Hive 0.90, Linux, Hadoop, Data Type Extension
Reporter: Deepak Raj


Many companies are rethinking their strategies to adapt Hive into their ETL 
just for the reason that it does not support the most basic data types like 
Numeric(a,b). I believe there should be an improvement with upcoming versions 
of hive. Can we extend the hive with custom UDF for Numeric(a,b) data type?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6023) Numeric Data Type Support

2013-12-12 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846477#comment-13846477
 ] 

Eric Hanson commented on HIVE-6023:
---

This'd be a nice addition. Also, the code in the Hive trunk now has support for 
decimal(p, s) which is functionally equivalent to numeric(p, s).

 Numeric Data Type Support
 -

 Key: HIVE-6023
 URL: https://issues.apache.org/jira/browse/HIVE-6023
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema, File Formats
Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0
 Environment: Hive 0.90, Linux, Hadoop, Data Type Extension
Reporter: Deepak Raj
  Labels: Data, Hive, Numeric, Type1
   Original Estimate: 2h
  Remaining Estimate: 2h

 Many companies are rethinking their strategies to adapt Hive into their ETL 
 just for the reason that it does not support the most basic data types like 
 Numeric(a,b). I believe there should be an improvement with upcoming versions 
 of hive. Can we extend the hive with custom UDF for Numeric(a,b) data type?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2013-12-12 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-6024:
--

 Summary: Load data local inpath unnecessarily creates a copy task
 Key: HIVE-6024
 URL: https://issues.apache.org/jira/browse/HIVE-6024
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ashutosh Chauhan


Load data command creates an additional copy task only when its loading from 
{{local}} It doesn't create this additional copy task while loading from DFS 
though.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5555) Support alternate join syntax: joining conditions in where clause; also pushdown qualifying join conditions

2013-12-12 Thread Matt Tucker (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846481#comment-13846481
 ] 

Matt Tucker commented on HIVE-:
---

Does this also allow for SQL-89 style joins?

{noformat}
explain select *
from part p1, part p2, part p3 
where p1.p_name = p2.p_name and p2.p_name = p3.p_name;
{noformat}

 Support alternate join syntax: joining conditions in where clause; also 
 pushdown qualifying join conditions 
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: AlternativeJoinSyntax.pdf


 Certain tools still generate `old style' Join queries where the join
 condition is in the Where clause. A related set of issues that can
 be addressed is that of pushing forward joining conditions;
 in a manner similar to the Predicate Pushdown feature of Hive.
 For e.g. these queries can have join conditions pushed down:
 {noformat}
 - query 1, push join predicate from 2nd join to 1st
 explain select *
 from part p1 join part p2 join part p3 on p1.p_name = p2.p_name and p2.p_name 
 = p3.p_name;
 - query 2
 explain select *
 from part p1 join part p2 join part p3 
 where p1.p_name = p2.p_name and p2.p_name = p3.p_name;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2013-12-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846485#comment-13846485
 ] 

Ashutosh Chauhan commented on HIVE-6024:


This results in inconsistent semantic that while loading from local source 
files are *not* moved but copied, but while loading from DFS sources files are 
moved and thus are deleted at source location after operation. Ideally same 
semantic of move (delete at source) should be provided while loading either 
from DFS or local. What exactly should be semantic can be debated, however 
scope for this jira is limited to not a create an additional copy task while 
loading from local, but rather do the copy (instead of move) in MoveTask itself 
and thus saving on unnecessary task execution and FS operations.

 Load data local inpath unnecessarily creates a copy task
 

 Key: HIVE-6024
 URL: https://issues.apache.org/jira/browse/HIVE-6024
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ashutosh Chauhan

 Load data command creates an additional copy task only when its loading from 
 {{local}} It doesn't create this additional copy task while loading from DFS 
 though.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Comment Edited] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2013-12-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846485#comment-13846485
 ] 

Ashutosh Chauhan edited comment on HIVE-6024 at 12/12/13 5:41 PM:
--

This results in inconsistent semantic that while loading from local source 
files are *not* moved but copied, but while loading from DFS sources files are 
moved and thus are deleted at source location after operation. Ideally same 
semantic of load (delete at source) should be provided while loading either 
from DFS or local. What exactly should be semantic can be debated, however 
scope for this jira is limited to not a create an additional copy task while 
loading from local, but rather do the copy (instead of move) in MoveTask itself 
and thus saving on unnecessary task execution and FS operations.


was (Author: ashutoshc):
This results in inconsistent semantic that while loading from local source 
files are *not* moved but copied, but while loading from DFS sources files are 
moved and thus are deleted at source location after operation. Ideally same 
semantic of move (delete at source) should be provided while loading either 
from DFS or local. What exactly should be semantic can be debated, however 
scope for this jira is limited to not a create an additional copy task while 
loading from local, but rather do the copy (instead of move) in MoveTask itself 
and thus saving on unnecessary task execution and FS operations.

 Load data local inpath unnecessarily creates a copy task
 

 Key: HIVE-6024
 URL: https://issues.apache.org/jira/browse/HIVE-6024
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ashutosh Chauhan

 Load data command creates an additional copy task only when its loading from 
 {{local}} It doesn't create this additional copy task while loading from DFS 
 though.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6018) FetchTask should not reference metastore classes

2013-12-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6018:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 FetchTask should not reference metastore classes
 

 Key: HIVE-6018
 URL: https://issues.apache.org/jira/browse/HIVE-6018
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.13.0

 Attachments: HIVE-6018.1.patch.txt


 The below code parts in PartitionDesc throws NoClassDefFounError sometimes  
 in execution.
 {noformat}
 public Deserializer getDeserializer() {
 try {
   return MetaStoreUtils.getDeserializer(Hive.get().getConf(), 
 getProperties());
 } catch (Exception e) {
   return null;
 }
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.

2013-12-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846533#comment-13846533
 ] 

Ashutosh Chauhan commented on HIVE-6016:


Instead of doing filter.accept() logic twice (after this patch), it seems like 
its enough to just do it once in outer loop(as introduced in this patch). Shall 
we remove existing filter.accept() from inner loop?

 Hadoop23Shims has a bug in listLocatedStatus impl.
 --

 Key: HIVE-6016
 URL: https://issues.apache.org/jira/browse/HIVE-6016
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0
Reporter: Sushanth Sowmyan
Assignee: Prasanth J
 Attachments: HIVE-6016.1.patch


 Prashant and I discovered that the implementation of the wrapping Iterator in 
 listLocatedStatus at 
 https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
  is broken.
 Basically, if you had files (a,b,_s) , with a filter that is supposed to 
 filter out _s, we expect an output result of (a,b). Instead, we get 
 (a,b,null), with hasNext looking at the next value to see if it's null, and 
 using that to decide if it has any more entries, and thus, (a,b,_s) becomes 
 (a,b).
 There's a boundary condition on the very first pick, which causes a (_s,a,b) 
 to result in (_s,a,b), bypassing the filter, and thus, we wind up with a 
 resultant unfiltered (_s,a,b) which orc breaks on.
 The effect of this bug is that Orc will not be able to read directories where 
 there is a _SUCCESS file, say, as the first entry returned by the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-12 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846553#comment-13846553
 ] 

Eric Hanson commented on HIVE-5996:
---

Xuefu,

I'm all for new, useful functionality and better performance for Hive. And I'm 
all for getting correct results. I appreciate your contributions and your 
passion.

But I strongly believe changing behavior from one reasonable alternative to 
another in a way that breaks backward compatibility is not the way to go. I 
have a lot of experience with evolving a database (SQL Server) over a decade, 
and have talked to a many people who've been evolving the product longer than 
that. From this experience, I can say that changing backward compatibility (for 
either functionality or performance, but especially functionality) even in 
subtle ways can anger customers/users. 

Any changes to semantics like this should first of all be avoided, and if they 
can't be avoided, they need to be rolled out carefully, with a switch to enable 
backward compatibility. SQL Server has compatibility levels and SET options 
as switches, and a defined deprecation schedule. This is kind of process-heavy 
in the engineering effort, and also causes explosion of the test matrix. So I 
am not recommending necessarily that Hive go there, though maybe we need to 
have that discussion. I think we're better off being strict about not breaking 
backward compatibility unless really needed.

So, I ask that you please close this JIRA without making a patch.

There are a couple of other areas where there is an issue of ANSI SQL 
compatibility (result type of int/int and avg(int)). We could have a further 
discussion on those, though you know my preference would be to leave the 
semantics as-is on those since I think backward compatibility trumps ANSI SQL 
compatibility for those. If there is no issue of ANSI compatibility, and the 
current Hive behavior is reasonable, I'd like us to leave things as they are. I 
don't think there is a need to be across-the-board compatible with another 
system (MySQL or anything else).

Best regards,
Eric

P.S. Your specific argument that you can overflow a bigint sum, while 
technically accurate, I think is not a significant user issue. I've never heard 
a complaint about it with SQL Server, or PDW, our scale-out data warehouse 
appliance. Really big numbers, like the national debt in pennies, fit in a 
bigint, just to put it in perspective. Users can cast the input to decimal or 
double if they need more magnitude.

 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5555) Support alternate join syntax: joining conditions in where clause; also pushdown qualifying join conditions

2013-12-12 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846557#comment-13846557
 ] 

Harish Butani commented on HIVE-:
-

Had filed HIVE-5558 to address this. Haven't gotten around to doing it yet. 
Do you need this urgently?


 Support alternate join syntax: joining conditions in where clause; also 
 pushdown qualifying join conditions 
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: AlternativeJoinSyntax.pdf


 Certain tools still generate `old style' Join queries where the join
 condition is in the Where clause. A related set of issues that can
 be addressed is that of pushing forward joining conditions;
 in a manner similar to the Predicate Pushdown feature of Hive.
 For e.g. these queries can have join conditions pushed down:
 {noformat}
 - query 1, push join predicate from 2nd join to 1st
 explain select *
 from part p1 join part p2 join part p3 on p1.p_name = p2.p_name and p2.p_name 
 = p3.p_name;
 - query 2
 explain select *
 from part p1 join part p2 join part p3 
 where p1.p_name = p2.p_name and p2.p_name = p3.p_name;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: Review Request 15663: Hive should be able to skip header and footer rows when reading data file for a table

2013-12-12 Thread Shuaishuai Nie

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15663/#review30269
---



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15663/#comment57938

this is the maximum number of footer a user can define. This prevent user 
defines too many footers which consume memory



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15663/#comment57939

fixed



itests/qtest/pom.xml
https://reviews.apache.org/r/15663/#comment57940

this is necessary if the test is in the MimimrCliDriver test class



ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
https://reviews.apache.org/r/15663/#comment57941

comment added



ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
https://reviews.apache.org/r/15663/#comment57942

Hi Eric, do you mean I need a blank line before each comment? I didn't see 
this on other part of the code. Or you mean a space after //?



ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
https://reviews.apache.org/r/15663/#comment57943

fixed



ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
https://reviews.apache.org/r/15663/#comment57944

name changed



ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
https://reviews.apache.org/r/15663/#comment57945

Since the return behavior is different in both place, it is hard to reuse 
the code in both place because of the minor differences



ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
https://reviews.apache.org/r/15663/#comment57946

fixed



ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
https://reviews.apache.org/r/15663/#comment57947

fixed



ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
https://reviews.apache.org/r/15663/#comment57948

fixed



ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java
https://reviews.apache.org/r/15663/#comment57949

fixed the comment. Since I need deep copy of the key and value field 
through ReflectionUtils, this new class is necessary



ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java
https://reviews.apache.org/r/15663/#comment57950

fixed



ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java
https://reviews.apache.org/r/15663/#comment57951

fixed



ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java
https://reviews.apache.org/r/15663/#comment57952

fixed



ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java
https://reviews.apache.org/r/15663/#comment57953

fixed



ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
https://reviews.apache.org/r/15663/#comment57954

since the header and footer is removed based on each file, I think it 
should be fine if multiple splits are combined since each file will have its 
own path



ql/src/test/org/apache/hadoop/hive/ql/io/TestHiveBinarySearchRecordReader.java
https://reviews.apache.org/r/15663/#comment57955

yes, otherwise an exception will be thrown when accessing 
pathToPartitionInfo info during the test since the job context is incomplete in 
the unit test



ql/src/test/queries/clientpositive/file_with_header_footer.q
https://reviews.apache.org/r/15663/#comment57956

negative tests added for this senario


- Shuaishuai Nie


On Nov. 19, 2013, 1:31 a.m., Eric Hanson wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15663/
 ---
 
 (Updated Nov. 19, 2013, 1:31 a.m.)
 
 
 Review request for hive and Thejas Nair.
 
 
 Bugs: HIVE-5795
 https://issues.apache.org/jira/browse/HIVE-5795
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Hive should be able to skip header and footer rows when reading data file for 
 a table
 
 (I am uploading this on behalf of Shuaishuai Nie since he's not in the office)
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 32ab3d8 
   data/files/header_footer_table_1/0001.txt PRE-CREATION 
   data/files/header_footer_table_1/0002.txt PRE-CREATION 
   data/files/header_footer_table_1/0003.txt PRE-CREATION 
   data/files/header_footer_table_2/2012/01/01/0001.txt PRE-CREATION 
   data/files/header_footer_table_2/2012/01/02/0002.txt PRE-CREATION 
   data/files/header_footer_table_2/2012/01/03/0003.txt PRE-CREATION 
   itests/qtest/pom.xml a453d8a 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 5abcfc1 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
 dd5cb6b 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 0ec6e63 
   
 

Re: Review Request 15663: Hive should be able to skip header and footer rows when reading data file for a table

2013-12-12 Thread Shuaishuai Nie

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15663/#review30270
---



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15663/#comment57957

Hi Eric, I uploaded the new diff with the fixes here 
https://reviews.apache.org/r/16184/diff/#index_header



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/15663/#comment57958

Hi Thejas, I uploaded the new diff file with the fixes in the review board 
here: https://reviews.apache.org/r/16184/diff/#index_header


- Shuaishuai Nie


On Nov. 19, 2013, 1:31 a.m., Eric Hanson wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15663/
 ---
 
 (Updated Nov. 19, 2013, 1:31 a.m.)
 
 
 Review request for hive and Thejas Nair.
 
 
 Bugs: HIVE-5795
 https://issues.apache.org/jira/browse/HIVE-5795
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Hive should be able to skip header and footer rows when reading data file for 
 a table
 
 (I am uploading this on behalf of Shuaishuai Nie since he's not in the office)
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 32ab3d8 
   data/files/header_footer_table_1/0001.txt PRE-CREATION 
   data/files/header_footer_table_1/0002.txt PRE-CREATION 
   data/files/header_footer_table_1/0003.txt PRE-CREATION 
   data/files/header_footer_table_2/2012/01/01/0001.txt PRE-CREATION 
   data/files/header_footer_table_2/2012/01/02/0002.txt PRE-CREATION 
   data/files/header_footer_table_2/2012/01/03/0003.txt PRE-CREATION 
   itests/qtest/pom.xml a453d8a 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java 5abcfc1 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
 dd5cb6b 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 0ec6e63 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/TestHiveBinarySearchRecordReader.java
  85dd975 
   ql/src/test/org/apache/hadoop/hive/ql/io/TestSymlinkTextInputFormat.java 
 0686d9b 
   ql/src/test/queries/clientpositive/file_with_header_footer.q PRE-CREATION 
   ql/src/test/results/clientpositive/file_with_header_footer.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/15663/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Eric Hanson
 




[jira] [Updated] (HIVE-5756) Implement vectorization support for IF conditional expression for long, double, timestamp, boolean and string inputs

2013-12-12 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-5756:
--

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

 Implement vectorization support for IF conditional expression for long, 
 double, timestamp, boolean and string inputs
 

 Key: HIVE-5756
 URL: https://issues.apache.org/jira/browse/HIVE-5756
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Fix For: 0.13.0

 Attachments: HIVE-5756.1.patch, HIVE-5756.2.patch, HIVE-5756.3.patch, 
 HIVE-5756.4.patch, HIVE-5756.5.patch, HIVE-5756.6.patch.txt, 
 HIVE-5756.7.patch, HIVE-5756.8.patch


 Implement full, end-to-end support for IF in vectorized mode, including new 
 VectorExpression class(es), VectorizationContext translation to a 
 VectorExpression, and unit tests for these, as well as end-to-end ad hoc 
 testing. An end-to-end .q test is recommended but optional.
 This is high priority because IF is the most popular conditional expression.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5756) Implement vectorization support for IF conditional expression for long, double, timestamp, boolean and string inputs

2013-12-12 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846596#comment-13846596
 ] 

Eric Hanson commented on HIVE-5756:
---

Thanks for the review, Jitendra!

Committed to trunk. 



 Implement vectorization support for IF conditional expression for long, 
 double, timestamp, boolean and string inputs
 

 Key: HIVE-5756
 URL: https://issues.apache.org/jira/browse/HIVE-5756
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Fix For: 0.13.0

 Attachments: HIVE-5756.1.patch, HIVE-5756.2.patch, HIVE-5756.3.patch, 
 HIVE-5756.4.patch, HIVE-5756.5.patch, HIVE-5756.6.patch.txt, 
 HIVE-5756.7.patch, HIVE-5756.8.patch


 Implement full, end-to-end support for IF in vectorized mode, including new 
 VectorExpression class(es), VectorizationContext translation to a 
 VectorExpression, and unit tests for these, as well as end-to-end ad hoc 
 testing. An end-to-end .q test is recommended but optional.
 This is high priority because IF is the most popular conditional expression.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-12 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846635#comment-13846635
 ] 

Eric Hanson commented on HIVE-5996:
---

Xuefu,

I see you want to make changes to have Hive be more in line with MySQL 
semantics. Can you explain why you're making these changes or considering them?

Thanks,
Eric

 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6023) Numeric Data Type Support

2013-12-12 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846647#comment-13846647
 ] 

Eric Hanson commented on HIVE-6023:
---

Not that I know of. You could implement UDFs to do operations on some other 
type like a string to get the semantics you want but that sounds like too much 
work for this situation.

 Numeric Data Type Support
 -

 Key: HIVE-6023
 URL: https://issues.apache.org/jira/browse/HIVE-6023
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema, File Formats
Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0
 Environment: Hive 0.90, Linux, Hadoop, Data Type Extension
Reporter: Deepak Raj
  Labels: Data, Hive, Numeric, Type1
   Original Estimate: 2h
  Remaining Estimate: 2h

 Many companies are rethinking their strategies to adapt Hive into their ETL 
 just for the reason that it does not support the most basic data types like 
 Numeric(a,b). I believe there should be an improvement with upcoming versions 
 of hive. Can we extend the hive with custom UDF for Numeric(a,b) data type?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6023) Numeric Data Type Support

2013-12-12 Thread Deepak Raj (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846640#comment-13846640
 ] 

Deepak Raj commented on HIVE-6023:
--

Well, The latest version does come with decimal(p,s) but is there a way we can 
add a custom data type to hive older version?.I don't think we can..

 Numeric Data Type Support
 -

 Key: HIVE-6023
 URL: https://issues.apache.org/jira/browse/HIVE-6023
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema, File Formats
Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0
 Environment: Hive 0.90, Linux, Hadoop, Data Type Extension
Reporter: Deepak Raj
  Labels: Data, Hive, Numeric, Type1
   Original Estimate: 2h
  Remaining Estimate: 2h

 Many companies are rethinking their strategies to adapt Hive into their ETL 
 just for the reason that it does not support the most basic data types like 
 Numeric(a,b). I believe there should be an improvement with upcoming versions 
 of hive. Can we extend the hive with custom UDF for Numeric(a,b) data type?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6015) vectorized logarithm produces results for 0 that are different from a non-vectorized one

2013-12-12 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846675#comment-13846675
 ] 

Eric Hanson commented on HIVE-6015:
---

+1

 vectorized logarithm produces results for 0 that are different from a 
 non-vectorized one
 

 Key: HIVE-6015
 URL: https://issues.apache.org/jira/browse/HIVE-6015
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
  Labels: vectorization
 Attachments: HIVE-6015.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HIVE-6025) Add Prasad to committer list

2013-12-12 Thread Prasad Mujumdar (JIRA)
Prasad Mujumdar created HIVE-6025:
-

 Summary: Add Prasad  to committer list
 Key: HIVE-6025
 URL: https://issues.apache.org/jira/browse/HIVE-6025
 Project: Hive
  Issue Type: Test
  Components: Documentation
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6025) Add Prasad to committer list

2013-12-12 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-6025:
--

Attachment: HIVE-6025.1.patch

 Add Prasad  to committer list
 -

 Key: HIVE-6025
 URL: https://issues.apache.org/jira/browse/HIVE-6025
 Project: Hive
  Issue Type: Test
  Components: Documentation
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
Priority: Minor
 Attachments: HIVE-6025.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: Review Request 16207: HIVE-1466: Add NULL DEFINED AS to ROW FORMAT specification

2013-12-12 Thread Prasad Mujumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16207/
---

(Updated Dec. 12, 2013, 8:20 p.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-1466
https://issues.apache.org/jira/browse/HIVE-1466


Repository: hive-git


Description
---

Support configurable null format for tables and writing out to directory.
Using a non-default null format is a bit while creating a table, and pretty 
much impossible to export the data to local filesystem using insert overwrite 
directory.
The patch enhances the SQL syntax to support 'NULL DEFINED AS' construct for 
create table as well as insert overwrite directory.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java d32be59 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 4b7fc73 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 366b714 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 5e5b8cf 
  ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 8cf5ad6 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d0a0ec7 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 93b4181 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java b6097b1 
  ql/src/test/queries/clientpositive/nullformat.q PRE-CREATION 
  ql/src/test/queries/clientpositive/nullformatdir.q PRE-CREATION 
  ql/src/test/results/clientpositive/nullformat.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/nullformatdir.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/16207/diff/


Testing
---

Added new tests.


Thanks,

Prasad Mujumdar



[jira] [Commented] (HIVE-6025) Add Prasad to committer list

2013-12-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846699#comment-13846699
 ] 

Xuefu Zhang commented on HIVE-6025:
---

+1

 Add Prasad  to committer list
 -

 Key: HIVE-6025
 URL: https://issues.apache.org/jira/browse/HIVE-6025
 Project: Hive
  Issue Type: Test
  Components: Documentation
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
Priority: Minor
 Attachments: HIVE-6025.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846705#comment-13846705
 ] 

Xuefu Zhang commented on HIVE-5996:
---

[~ehans] Thanks for sharing your thoughts and your inquiry. For your 
information, I'm not trying to make MySQL as the model. My first line of 
consideration is SQL standard. For a functionality if there is no SQL standard, 
Hive doesn't have to invent everything, thus, I do reference MySQL for ideas, 
mostly because MySQL and its technical documentation are readily available. 
However, this doesn't precludeme following other DB's practice. For instance, 
precision/scale determination for arithmetic operations in hive is following 
SQL server's formula. I'm not either anti- or pro- MySQL. Nor am I to SQL 
server, but I strongly believe that following well-established practices 
benefits Hive than doing something in a unique, unfortunate way. An example 
would be int/int in Hive.

However, a  lot of existing functionality in Hive was put into place when Hive 
is positioned as a tool rather than DB, and before all necessary data types 
were introduced. Take int/int again as an example, early developer probably 
didn't even think about SQL-compliance, and even if he/she did, there wasn't 
decimal data type as a consideration. As Hive is shift to a DB on bigdata 
positioning, I believe that we should start thinking in a perspective other 
than performance or backward compatibility. If we restrict ourselves based on  
unconscious decisions made in the past, we may lose a lot of opportunities of 
doing the right things.

As I worked on decimal precision/scale support, I found a lot of problems in 
Hive about data types and their conversions and promotions. In many cases, Hive 
is not consistent itself. Let me ask you a question to see if you know the 
answer: what's the return type of 35 + '3.14', where 35 is from int column and 
'3.14' from a string column? Before I made the changes, you probably would say: 
wait, let me read the code first. And your answer might be different if my 
question were 35/'3.14'. Now, to answer the same questions, I can give right 
way, and I have a theory to tell why. In summary, it's a lot of effort to clean 
up the mess and inconsistency in Hive from the beginning of my work on decimal.

Now if we use either performance or backward compatibility to shut down what we 
have achieved, I don't see how Hive is shifting from a tool to a DB, and how 
Hive can become adopted as enterprise grade product.

Hive is still evolving, and that's why I think we have certain luxury of 
breaking backward compatibility for doing the right thing. As Ashutosh once 
mentioned, we don't want to be backward compatible to a bug. Once Hive is 
stabilized, it becomes much harder to make backward incompatible changes, as 
you know with your experience with SQL server.

I understand your concern about backward compatibility, especially your 
possible frustration over vectorization breaking or redoing. On the other hand, 
I think we are here to help hive become more useful. A blunt rejection without 
much consideration and communication doesn't seem as helpful and constructive 
as it should be.

 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846718#comment-13846718
 ] 

Thejas M Nair commented on HIVE-5996:
-

bq. Now, to answer the same questions, I can give right way, and I have a 
theory to tell why.
It would be great if you can document the theory, otherwise I still would need 
to look at code to understand the theory !  :)

I really appreciate the code cleanup you have been doing. But we have to be 
careful about backward compatibility. I also agree that we should not burden 
new users with historic problems.
Regarding Once Hive is stabilized, how do we define that ? Maybe, once we 
create a list of non backward compatible changes that are important to make, we 
can make a major release version (1.x) , we can break the backward 
compatibility of certain things, and document it very well. Hopefully, that 
list of non-backward compatible changes can be kept small.

I discuss this in context of config defaults in HIVE-5875 .



 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846721#comment-13846721
 ] 

Thejas M Nair commented on HIVE-5996:
-

Regarding the specific change in this jira, I am not convinced that is an 
important  non-backward compatible change to make. You can have an overflow 
even with decimal type, if they are large enough, with just two rows.
On the other hand, the int division returning double is arguably a change to 
consider for a 1.0 candidate, as that is a SQL compliance issue.


 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HIVE-5824) Support generation of html test reports in maven

2013-12-12 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J resolved HIVE-5824.
--

Resolution: Not A Problem

As [~ashutoshc] pointed out.. This patch is not required as surefire reporting 
plugin was already in pom.xml.. Closing it as Not a Problem.

 Support generation of html test reports in maven
 

 Key: HIVE-5824
 URL: https://issues.apache.org/jira/browse/HIVE-5824
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
  Labels: build, maven, test
 Attachments: HIVE-5824.2.patch.txt, HIVE-5824.patch.txt


 {code}ant testreport{code} generated output of test results in html format. 
 It is useful to support the same in maven. The default test report generated 
 by maven is in XML format which is hard to read.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-4395) Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-12-12 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4395:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk.

 Support TFetchOrientation.FIRST for HiveServer2 FetchResults
 

 Key: HIVE-4395
 URL: https://issues.apache.org/jira/browse/HIVE-4395
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-4395-1.patch, HIVE-4395.1.patch, HIVE-4395.2.patch, 
 HIVE-4395.3.patch, HIVE-4395.4.patch, HIVE-4395.5.patch, HIVE-4395.6.patch


 Currently HiveServer2 only support fetching next row 
 (TFetchOrientation.NEXT). This ticket is to implement support for 
 TFetchOrientation.FIRST that resets the fetch position at the begining of the 
 resultset. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HIVE-6026) Ldap Authenticator should be more generic with BindDN

2013-12-12 Thread Johndee Burks (JIRA)
Johndee Burks created HIVE-6026:
---

 Summary: Ldap Authenticator should be more generic with BindDN
 Key: HIVE-6026
 URL: https://issues.apache.org/jira/browse/HIVE-6026
 Project: Hive
  Issue Type: Bug
  Components: Authentication
Affects Versions: 0.10.0
 Environment: CDH4.4, Fedora Directory Service
Reporter: Johndee Burks
Priority: Minor


The bindDN implementation should be more generic for the LDAP authenticator. 
Currently it looks like this: 

 49 // setup the security principal
 50 String bindDN;
 51 if (baseDN != null) {
 52   bindDN = uid= + user + , + baseDN;
 53 } else {
 54   bindDN = user;
 55 }

This causes problems for ldap implementations that expect cn= first. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2013-12-12 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-5783:
-

Component/s: Serializers/Deserializers

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Attachments: HIVE-5783.patch, hive-0.11-parquet.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2013-12-12 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-5783:
-

Fix Version/s: (was: 0.11.0)

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Attachments: HIVE-5783.patch, hive-0.11-parquet.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-12 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846797#comment-13846797
 ] 

Eric Hanson commented on HIVE-5783:
---

Could somebody put the patch on ReviewBoard? That's make it easier to look at.

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Attachments: HIVE-5783.patch, hive-0.11-parquet.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6014) Stage ids differ in the tez branch

2013-12-12 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6014:
-

Fix Version/s: tez-branch

 Stage ids differ in the tez branch
 --

 Key: HIVE-6014
 URL: https://issues.apache.org/jira/browse/HIVE-6014
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-6014.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5991) ORC RLEv2 fails with ArrayIndexOutOfBounds exception for PATCHED_BLOB encoding

2013-12-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5991:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Prasanth!

 ORC RLEv2 fails with ArrayIndexOutOfBounds exception for PATCHED_BLOB 
 encoding 
 ---

 Key: HIVE-5991
 URL: https://issues.apache.org/jira/browse/HIVE-5991
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.13.0

 Attachments: HIVE-5991.1.patch


 PATCHED_BLOB encoding creates mask with number of bits required for 95th 
 percentile value. If the 95th percentile value requires 32 bits then the mask 
 creation will result in integer overflow.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )

2013-12-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5994:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Prasanth!

 ORC RLEv2 encodes wrongly for large negative BIGINTs  (64 bits )
 

 Key: HIVE-5994
 URL: https://issues.apache.org/jira/browse/HIVE-5994
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.13.0

 Attachments: HIVE-5994.1.patch


 For large negative BIGINTs, zigzag encoding will yield large value (64bit 
 value) with MSB set to 1. This value is interpreted as negative value in 
 SerializationUtils.findClosestNumBits(long value) function. This resulted in 
 wrong computation of total number of bits required which results in wrong 
 encoding/decoding of values.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6004) Fix statistics annotation related test failures in hadoop2

2013-12-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6004:
---

Status: Patch Available  (was: Open)

Marking Patch Available to get Hive QA run.

 Fix statistics annotation related test failures in hadoop2
 --

 Key: HIVE-6004
 URL: https://issues.apache.org/jira/browse/HIVE-6004
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.13.0

 Attachments: HIVE-6004.1.patch


 Fix test failures that are related to HIVE-5369 and its subtask changes.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6004) Fix statistics annotation related test failures in hadoop2

2013-12-12 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846836#comment-13846836
 ] 

Prasanth J commented on HIVE-6004:
--

[~ashutoshc] these are hadoop2 tests. Will Hive QA run hadoop2 tests as well?

 Fix statistics annotation related test failures in hadoop2
 --

 Key: HIVE-6004
 URL: https://issues.apache.org/jira/browse/HIVE-6004
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.13.0

 Attachments: HIVE-6004.1.patch


 Fix test failures that are related to HIVE-5369 and its subtask changes.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.

2013-12-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846837#comment-13846837
 ] 

Ashutosh Chauhan commented on HIVE-6016:


Sorry, I was confused those are not 2 loops, but constructor and overloaded 
method. Patch looks good. +1

 Hadoop23Shims has a bug in listLocatedStatus impl.
 --

 Key: HIVE-6016
 URL: https://issues.apache.org/jira/browse/HIVE-6016
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0
Reporter: Sushanth Sowmyan
Assignee: Prasanth J
 Attachments: HIVE-6016.1.patch


 Prashant and I discovered that the implementation of the wrapping Iterator in 
 listLocatedStatus at 
 https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
  is broken.
 Basically, if you had files (a,b,_s) , with a filter that is supposed to 
 filter out _s, we expect an output result of (a,b). Instead, we get 
 (a,b,null), with hasNext looking at the next value to see if it's null, and 
 using that to decide if it has any more entries, and thus, (a,b,_s) becomes 
 (a,b).
 There's a boundary condition on the very first pick, which causes a (_s,a,b) 
 to result in (_s,a,b), bypassing the filter, and thus, we wind up with a 
 resultant unfiltered (_s,a,b) which orc breaks on.
 The effect of this bug is that Orc will not be able to read directories where 
 there is a _SUCCESS file, say, as the first entry returned by the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6004) Fix statistics annotation related test failures in hadoop2

2013-12-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846843#comment-13846843
 ] 

Ashutosh Chauhan commented on HIVE-6004:


No, but there is stats_partialscan_autogather is there eg, which looks like ll 
run for hadoop-1 also. There might be others too. So, wanted to make sure.

 Fix statistics annotation related test failures in hadoop2
 --

 Key: HIVE-6004
 URL: https://issues.apache.org/jira/browse/HIVE-6004
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.13.0

 Attachments: HIVE-6004.1.patch


 Fix test failures that are related to HIVE-5369 and its subtask changes.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6004) Fix statistics annotation related test failures in hadoop2

2013-12-12 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846848#comment-13846848
 ] 

Prasanth J commented on HIVE-6004:
--

Makes sense.

 Fix statistics annotation related test failures in hadoop2
 --

 Key: HIVE-6004
 URL: https://issues.apache.org/jira/browse/HIVE-6004
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.13.0

 Attachments: HIVE-6004.1.patch


 Fix test failures that are related to HIVE-5369 and its subtask changes.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5555) Support alternate join syntax: joining conditions in where clause; also pushdown qualifying join conditions

2013-12-12 Thread Matt Tucker (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846888#comment-13846888
 ] 

Matt Tucker commented on HIVE-:
---

It's a nice to have.  Just wanted to make sure that syntax was covered since 
it's similar to the other examples.

 Support alternate join syntax: joining conditions in where clause; also 
 pushdown qualifying join conditions 
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: AlternativeJoinSyntax.pdf


 Certain tools still generate `old style' Join queries where the join
 condition is in the Where clause. A related set of issues that can
 be addressed is that of pushing forward joining conditions;
 in a manner similar to the Predicate Pushdown feature of Hive.
 For e.g. these queries can have join conditions pushed down:
 {noformat}
 - query 1, push join predicate from 2nd join to 1st
 explain select *
 from part p1 join part p2 join part p3 on p1.p_name = p2.p_name and p2.p_name 
 = p3.p_name;
 - query 2
 explain select *
 from part p1 join part p2 join part p3 
 where p1.p_name = p2.p_name and p2.p_name = p3.p_name;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HIVE-6027) non-vectorized log10 has rounding issue

2013-12-12 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-6027:
--

 Summary: non-vectorized log10 has rounding issue
 Key: HIVE-6027
 URL: https://issues.apache.org/jira/browse/HIVE-6027
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial


In HIVE-6010, I found that vectorized and non-vectorized log10 may produce 
different results in the last digit of the mantissa (e.g. 7 vs 8). It turns out 
that vectorized one uses Math.log10, but non-vectorized uses log/log(10). Both 
should use Math.log10.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.

2013-12-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6016:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Prasanth!

 Hadoop23Shims has a bug in listLocatedStatus impl.
 --

 Key: HIVE-6016
 URL: https://issues.apache.org/jira/browse/HIVE-6016
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0
Reporter: Sushanth Sowmyan
Assignee: Prasanth J
 Fix For: 0.13.0

 Attachments: HIVE-6016.1.patch


 Prashant and I discovered that the implementation of the wrapping Iterator in 
 listLocatedStatus at 
 https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
  is broken.
 Basically, if you had files (a,b,_s) , with a filter that is supposed to 
 filter out _s, we expect an output result of (a,b). Instead, we get 
 (a,b,null), with hasNext looking at the next value to see if it's null, and 
 using that to decide if it has any more entries, and thus, (a,b,_s) becomes 
 (a,b).
 There's a boundary condition on the very first pick, which causes a (_s,a,b) 
 to result in (_s,a,b), bypassing the filter, and thus, we wind up with a 
 resultant unfiltered (_s,a,b) which orc breaks on.
 The effect of this bug is that Orc will not be able to read directories where 
 there is a _SUCCESS file, say, as the first entry returned by the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6027) non-vectorized log10 has rounding issue

2013-12-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6027:
---

Attachment: HIVE-6027.patch

trivial patch

 non-vectorized log10 has rounding issue
 ---

 Key: HIVE-6027
 URL: https://issues.apache.org/jira/browse/HIVE-6027
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HIVE-6027.patch


 In HIVE-6010, I found that vectorized and non-vectorized log10 may produce 
 different results in the last digit of the mantissa (e.g. 7 vs 8). It turns 
 out that vectorized one uses Math.log10, but non-vectorized uses log/log(10). 
 Both should use Math.log10.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6027) non-vectorized log10 has rounding issue

2013-12-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6027:
---

Status: Patch Available  (was: Open)

 non-vectorized log10 has rounding issue
 ---

 Key: HIVE-6027
 URL: https://issues.apache.org/jira/browse/HIVE-6027
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HIVE-6027.patch


 In HIVE-6010, I found that vectorized and non-vectorized log10 may produce 
 different results in the last digit of the mantissa (e.g. 7 vs 8). It turns 
 out that vectorized one uses Math.log10, but non-vectorized uses log/log(10). 
 Both should use Math.log10.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-12 Thread Pala M Muthaia (JIRA)
Pala M Muthaia created HIVE-6028:


 Summary: Partition predicate literals are not interpreted 
correctly.
 Key: HIVE-6028
 URL: https://issues.apache.org/jira/browse/HIVE-6028
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Pala M Muthaia


When parsing/analyzing query, hive treats partition predicate value as int 
instead of string. This breaks down and leads to incorrect result when the 
partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.

The following repro illustrates the bug:
-- create test table and partition, populate with some data
create table test_partition_pred(col1 int) partitioned by (hour STRING);
insert into table test_partition_pred partition (hour=00) select 21 FROM  
some_table limit 1;

-- this query returns incorrect results, i.e. just empty set.
select * from test_partition_pred where hour=00;
OK

-- this query returns correct result. Note predicate value is string literal
select * from test_partition_pred where hour='00';
OK
21  00

-- explain plan illustrates how the query was interpreted. Particularly the 
partition predicate is pushed down as regular filter clause, with hour=0 as 
predicate. 
explain select * from test_partition_pred where hour=00;

ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_partition_pred))) 
(TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR 
TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL hour) 00

STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: test_partition_pred
  Filter Operator
predicate:
expr: (hour = 0)
type: boolean
Select Operator
  expressions:
expr: col1
type: int
expr: hour
type: string
  outputColumnNames: _col0, _col1
  ListSink

-- comparing plan for query with correct result
explain select * from test_partition_pred where hour='00';

ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_partition_pred))) 
(TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR 
TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL hour) '00'

STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: test_partition_pred
  Select Operator
expressions:
  expr: col1
  type: int
  expr: hour
  type: string
outputColumnNames: _col0, _col1
ListSink

Note:
1. The type of the partition column is defined as string, not int.
2. This is a regression in Hive 0.12. This used to work in Hive 0.11
3. Not an issue when the partition value starts with integer other than 0, e.g 
hour=10, hour=11 etc.
4. As seen above, workaround is to use string literal hour='00' etc.

This should not be too bad if in the failing case hive complains that partition 
hour=0 is not found, or complains literal type doesn't match column type. 
Instead hive silently pushes it down as filter clause, and query succeeds with 
empty set as result.

We found this out in our production tables partitioned by hour, only a few days 
after it started occurring, when there were empty data sets for partitions 
hour=00 to hour=09.







--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-12 Thread Pala M Muthaia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pala M Muthaia updated HIVE-6028:
-

Attachment: Hive-6028-explain-plan.txt

 Partition predicate literals are not interpreted correctly.
 ---

 Key: HIVE-6028
 URL: https://issues.apache.org/jira/browse/HIVE-6028
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Pala M Muthaia
 Attachments: Hive-6028-explain-plan.txt


 When parsing/analyzing query, hive treats partition predicate value as int 
 instead of string. This breaks down and leads to incorrect result when the 
 partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
 The following repro illustrates the bug:
 -- create test table and partition, populate with some data
 create table test_partition_pred(col1 int) partitioned by (hour STRING);
 insert into table test_partition_pred partition (hour=00) select 21 FROM  
 some_table limit 1;
 -- this query returns incorrect results, i.e. just empty set.
 select * from test_partition_pred where hour=00;
 OK
 -- this query returns correct result. Note predicate value is string literal
 select * from test_partition_pred where hour='00';
 OK
 2100
 Note:
 1. The type of the partition column is defined as string, not int.
 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
 3. Not an issue when the partition value starts with integer other than 0, 
 e.g hour=10, hour=11 etc.
 4. As seen above, workaround is to use string literal hour='00' etc.
 This should not be too bad if in the failing case hive complains that 
 partition hour=0 is not found, or complains literal type doesn't match column 
 type. Instead hive silently pushes it down as filter clause, and query 
 succeeds with empty set as result.
 We found this out in our production tables partitioned by hour, only a few 
 days after it started occurring, when there were empty data sets for 
 partitions hour=00 to hour=09.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-12 Thread Pala M Muthaia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pala M Muthaia updated HIVE-6028:
-

Description: 
When parsing/analyzing query, hive treats partition predicate value as int 
instead of string. This breaks down and leads to incorrect result when the 
partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.

The following repro illustrates the bug:
-- create test table and partition, populate with some data
create table test_partition_pred(col1 int) partitioned by (hour STRING);
insert into table test_partition_pred partition (hour=00) select 21 FROM  
some_table limit 1;

-- this query returns incorrect results, i.e. just empty set.
select * from test_partition_pred where hour=00;
OK

-- this query returns correct result. Note predicate value is string literal
select * from test_partition_pred where hour='00';
OK
21  00


Note:
1. The type of the partition column is defined as string, not int.
2. This is a regression in Hive 0.12. This used to work in Hive 0.11
3. Not an issue when the partition value starts with integer other than 0, e.g 
hour=10, hour=11 etc.
4. As seen above, workaround is to use string literal hour='00' etc.

This should not be too bad if in the failing case hive complains that partition 
hour=0 is not found, or complains literal type doesn't match column type. 
Instead hive silently pushes it down as filter clause, and query succeeds with 
empty set as result.

We found this out in our production tables partitioned by hour, only a few days 
after it started occurring, when there were empty data sets for partitions 
hour=00 to hour=09.





  was:
When parsing/analyzing query, hive treats partition predicate value as int 
instead of string. This breaks down and leads to incorrect result when the 
partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.

The following repro illustrates the bug:
-- create test table and partition, populate with some data
create table test_partition_pred(col1 int) partitioned by (hour STRING);
insert into table test_partition_pred partition (hour=00) select 21 FROM  
some_table limit 1;

-- this query returns incorrect results, i.e. just empty set.
select * from test_partition_pred where hour=00;
OK

-- this query returns correct result. Note predicate value is string literal
select * from test_partition_pred where hour='00';
OK
21  00

-- explain plan illustrates how the query was interpreted. Particularly the 
partition predicate is pushed down as regular filter clause, with hour=0 as 
predicate. 
explain select * from test_partition_pred where hour=00;

ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_partition_pred))) 
(TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR 
TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL hour) 00

STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: test_partition_pred
  Filter Operator
predicate:
expr: (hour = 0)
type: boolean
Select Operator
  expressions:
expr: col1
type: int
expr: hour
type: string
  outputColumnNames: _col0, _col1
  ListSink

-- comparing plan for query with correct result
explain select * from test_partition_pred where hour='00';

ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME test_partition_pred))) 
(TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR 
TOK_ALLCOLREF)) (TOK_WHERE (= (TOK_TABLE_OR_COL hour) '00'

STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: test_partition_pred
  Select Operator
expressions:
  expr: col1
  type: int
  expr: hour
  type: string
outputColumnNames: _col0, _col1
ListSink

Note:
1. The type of the partition column is defined as string, not int.
2. This is a regression in Hive 0.12. This used to work in Hive 0.11
3. Not an issue when the partition value starts with integer other than 0, e.g 
hour=10, hour=11 etc.
4. As seen above, workaround is to use string literal hour='00' etc.

This should not be too bad if in the failing case hive complains that partition 
hour=0 is not found, or complains literal type doesn't match column type. 
Instead hive silently pushes it down as filter clause, and query succeeds with 
empty set as result.

We found this out in our production tables partitioned by hour, only a few days 
after it started occurring, when there were empty data sets for partitions 
hour=00 

[jira] [Commented] (HIVE-5966) Fix eclipse:eclipse post shim aggregation changes

2013-12-12 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846919#comment-13846919
 ] 

Szehon Ho commented on HIVE-5966:
-

Hi [~brocknoland], wondering if this be committed as it was reviewed on 
reviewboard, or it needs more thought ?  Not very urgent, but it would help 
productivity by getting rid of eclipse errors.

 Fix eclipse:eclipse post shim aggregation changes
 -

 Key: HIVE-5966
 URL: https://issues.apache.org/jira/browse/HIVE-5966
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.0
Reporter: Brock Noland
Assignee: Szehon Ho
 Attachments: HIVE-5966.1.patch, HIVE-5966.patch


 The shim bundle module marks it's deps provided so users of the bundle won't 
 pull in the child dependencies. This causes the eclipse workspace generated 
 by eclipse:eclipse to fail because it only includes the source from the 
 bundle source directory, which is empty.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6028) Partition predicate literals are not interpreted correctly.

2013-12-12 Thread Pala M Muthaia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pala M Muthaia updated HIVE-6028:
-

Description: 
When parsing/analyzing query, hive treats partition predicate value as int 
instead of string. This breaks down and leads to incorrect result when the 
partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.

The following repro illustrates the bug:
-- create test table and partition, populate with some data
create table test_partition_pred(col1 int) partitioned by (hour STRING);
insert into table test_partition_pred partition (hour=00) select 21 FROM  
some_table limit 1;

-- this query returns incorrect results, i.e. just empty set.
select * from test_partition_pred where hour=00;
OK

-- this query returns correct result. Note predicate value is string literal
select * from test_partition_pred where hour='00';
OK
21  00

explain plan illustrates how the query was interpreted. Particularly the 
partition predicate is pushed down as regular filter clause, with hour=0 as 
predicate. See attached explain plan file.

Note:
1. The type of the partition column is defined as string, not int.
2. This is a regression in Hive 0.12. This used to work in Hive 0.11
3. Not an issue when the partition value starts with integer other than 0, e.g 
hour=10, hour=11 etc.
4. As seen above, workaround is to use string literal hour='00' etc.

This should not be too bad if in the failing case hive complains that partition 
hour=0 is not found, or complains literal type doesn't match column type. 
Instead hive silently pushes it down as filter clause, and query succeeds with 
empty set as result.

We found this out in our production tables partitioned by hour, only a few days 
after it started occurring, when there were empty data sets for partitions 
hour=00 to hour=09.





  was:
When parsing/analyzing query, hive treats partition predicate value as int 
instead of string. This breaks down and leads to incorrect result when the 
partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.

The following repro illustrates the bug:
-- create test table and partition, populate with some data
create table test_partition_pred(col1 int) partitioned by (hour STRING);
insert into table test_partition_pred partition (hour=00) select 21 FROM  
some_table limit 1;

-- this query returns incorrect results, i.e. just empty set.
select * from test_partition_pred where hour=00;
OK

-- this query returns correct result. Note predicate value is string literal
select * from test_partition_pred where hour='00';
OK
21  00


Note:
1. The type of the partition column is defined as string, not int.
2. This is a regression in Hive 0.12. This used to work in Hive 0.11
3. Not an issue when the partition value starts with integer other than 0, e.g 
hour=10, hour=11 etc.
4. As seen above, workaround is to use string literal hour='00' etc.

This should not be too bad if in the failing case hive complains that partition 
hour=0 is not found, or complains literal type doesn't match column type. 
Instead hive silently pushes it down as filter clause, and query succeeds with 
empty set as result.

We found this out in our production tables partitioned by hour, only a few days 
after it started occurring, when there were empty data sets for partitions 
hour=00 to hour=09.






 Partition predicate literals are not interpreted correctly.
 ---

 Key: HIVE-6028
 URL: https://issues.apache.org/jira/browse/HIVE-6028
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Pala M Muthaia
 Attachments: Hive-6028-explain-plan.txt


 When parsing/analyzing query, hive treats partition predicate value as int 
 instead of string. This breaks down and leads to incorrect result when the 
 partition predicate value starts with int 0, e.g: hour=00, hour=05 etc.
 The following repro illustrates the bug:
 -- create test table and partition, populate with some data
 create table test_partition_pred(col1 int) partitioned by (hour STRING);
 insert into table test_partition_pred partition (hour=00) select 21 FROM  
 some_table limit 1;
 -- this query returns incorrect results, i.e. just empty set.
 select * from test_partition_pred where hour=00;
 OK
 -- this query returns correct result. Note predicate value is string literal
 select * from test_partition_pred where hour='00';
 OK
 2100
 explain plan illustrates how the query was interpreted. Particularly the 
 partition predicate is pushed down as regular filter clause, with hour=0 as 
 predicate. See attached explain plan file.
 Note:
 1. The type of the partition column is defined as string, not int.
 2. This is a regression in Hive 0.12. This used to work in Hive 0.11
 3. Not an issue when the partition value starts with integer other 

[jira] [Commented] (HIVE-6004) Fix statistics annotation related test failures in hadoop2

2013-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846936#comment-13846936
 ] 

Hive QA commented on HIVE-6004:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618195/HIVE-6004.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4779 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/624/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/624/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618195

 Fix statistics annotation related test failures in hadoop2
 --

 Key: HIVE-6004
 URL: https://issues.apache.org/jira/browse/HIVE-6004
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.13.0

 Attachments: HIVE-6004.1.patch


 Fix test failures that are related to HIVE-5369 and its subtask changes.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: Review Request 16207: HIVE-1466: Add NULL DEFINED AS to ROW FORMAT specification

2013-12-12 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16207/#review30300
---



ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
https://reviews.apache.org/r/16207/#comment58009

It would be nice to remove the leading tabs.


- Xuefu Zhang


On Dec. 12, 2013, 8:20 p.m., Prasad Mujumdar wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/16207/
 ---
 
 (Updated Dec. 12, 2013, 8:20 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-1466
 https://issues.apache.org/jira/browse/HIVE-1466
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Support configurable null format for tables and writing out to directory.
 Using a non-default null format is a bit while creating a table, and pretty 
 much impossible to export the data to local filesystem using insert overwrite 
 directory.
 The patch enhances the SQL syntax to support 'NULL DEFINED AS' construct for 
 create table as well as insert overwrite directory.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java d32be59 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 4b7fc73 
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 366b714 
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 5e5b8cf 
   ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 8cf5ad6 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d0a0ec7 
   ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 93b4181 
   ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java b6097b1 
   ql/src/test/queries/clientpositive/nullformat.q PRE-CREATION 
   ql/src/test/queries/clientpositive/nullformatdir.q PRE-CREATION 
   ql/src/test/results/clientpositive/nullformat.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/nullformatdir.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/16207/diff/
 
 
 Testing
 ---
 
 Added new tests.
 
 
 Thanks,
 
 Prasad Mujumdar
 




[jira] [Commented] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification

2013-12-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846939#comment-13846939
 ] 

Xuefu Zhang commented on HIVE-1466:
---

Patch looks good. Minor common on RB.

 Add NULL DEFINED AS to ROW FORMAT specification
 ---

 Key: HIVE-1466
 URL: https://issues.apache.org/jira/browse/HIVE-1466
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer
Assignee: Prasad Mujumdar
 Attachments: HIVE-1466.1.patch


 NULL values are passed to transformers as a literal backslash and a literal 
 N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. 
 This is inconsistent.
 The ROW FORMAT specification of tables should be able to specify the manner 
 in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or 
 '\003' or whatever should apply to all instances of table export and saving.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution

2013-12-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6010:
---

Attachment: HIVE-6010.patch

Here's the patch, with one example test. More tests can be added in other JIRAs 
(incl. for metastore stuff I mentioned, maybe).
Depending on whether this or logarithm fix goes first, I will uncomment 
logarithms in this test here or there. Or in another jira.

Already found one bug using this :) HIVE-6027.

 create a test that would ensure vectorization produces same results as 
 non-vectorized execution
 ---

 Key: HIVE-6010
 URL: https://issues.apache.org/jira/browse/HIVE-6010
 Project: Hive
  Issue Type: Test
  Components: Tests, Vectorization
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6010.patch


 So as to ensure that vectorization is not forgotten when changes are made to 
 things. Obviously it would not be viable to have a bulletproof test, but at 
 least a subset of operations can be verified.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution

2013-12-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6010:
---

Status: Patch Available  (was: Open)

 create a test that would ensure vectorization produces same results as 
 non-vectorized execution
 ---

 Key: HIVE-6010
 URL: https://issues.apache.org/jira/browse/HIVE-6010
 Project: Hive
  Issue Type: Test
  Components: Tests, Vectorization
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6010.patch


 So as to ensure that vectorization is not forgotten when changes are made to 
 things. Obviously it would not be viable to have a bulletproof test, but at 
 least a subset of operations can be verified.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6004) Fix statistics annotation related test failures in hadoop2

2013-12-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846949#comment-13846949
 ] 

Ashutosh Chauhan commented on HIVE-6004:


As suspected : ) Lets take out stats_partialscan_autogather from this patch, 
get this one committed and analyze auto_stats_partialscan in a diferent jira.

 Fix statistics annotation related test failures in hadoop2
 --

 Key: HIVE-6004
 URL: https://issues.apache.org/jira/browse/HIVE-6004
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.13.0

 Attachments: HIVE-6004.1.patch


 Fix test failures that are related to HIVE-5369 and its subtask changes.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Review Request 16229: HIVE-6010 create a test that would ensure vectorization produces same results as non-vectorized execution

2013-12-12 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16229/
---

Review request for hive and Jitendra Pandey.


Bugs: HIVE-6010
https://issues.apache.org/jira/browse/HIVE-6010


Repository: hive-git


Description
---

See jira.


Diffs
-

  ant/src/org/apache/hadoop/hive/ant/QTestGenTask.java 85351aa 
  itests/qtest/pom.xml 8c249a0 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java c16e82d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLog10.java 4b6dc6a 
  ql/src/test/queries/clientcompare/vectorized_math_funcs.q PRE-CREATION 
  ql/src/test/queries/clientcompare/vectorized_math_funcs_00.qv PRE-CREATION 
  ql/src/test/queries/clientcompare/vectorized_math_funcs_01.qv PRE-CREATION 
  ql/src/test/templates/TestCompareCliDriver.vm PRE-CREATION 

Diff: https://reviews.apache.org/r/16229/diff/


Testing
---


Thanks,

Sergey Shelukhin



Re: Review Request 16229: HIVE-6010 create a test that would ensure vectorization produces same results as non-vectorized execution

2013-12-12 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16229/
---

(Updated Dec. 13, 2013, 12:09 a.m.)


Review request for hive and Jitendra Pandey.


Bugs: HIVE-6010
https://issues.apache.org/jira/browse/HIVE-6010


Repository: hive-git


Description
---

See jira.


Diffs
-

  ant/src/org/apache/hadoop/hive/ant/QTestGenTask.java 85351aa 
  itests/qtest/pom.xml 8c249a0 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java c16e82d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLog10.java 4b6dc6a 
  ql/src/test/queries/clientcompare/vectorized_math_funcs.q PRE-CREATION 
  ql/src/test/queries/clientcompare/vectorized_math_funcs_00.qv PRE-CREATION 
  ql/src/test/queries/clientcompare/vectorized_math_funcs_01.qv PRE-CREATION 
  ql/src/test/templates/TestCompareCliDriver.vm PRE-CREATION 

Diff: https://reviews.apache.org/r/16229/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Commented] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution

2013-12-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846954#comment-13846954
 ] 

Sergey Shelukhin commented on HIVE-6010:


https://reviews.apache.org/r/16229/

 create a test that would ensure vectorization produces same results as 
 non-vectorized execution
 ---

 Key: HIVE-6010
 URL: https://issues.apache.org/jira/browse/HIVE-6010
 Project: Hive
  Issue Type: Test
  Components: Tests, Vectorization
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6010.patch


 So as to ensure that vectorization is not forgotten when changes are made to 
 things. Obviously it would not be viable to have a bulletproof test, but at 
 least a subset of operations can be verified.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-2093:


   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks for the contribution [~navis]!
Can you please update the release note section so that we can add that to wiki 
docs ? (If you prefer, you can also update the wiki docs directly)


 create/drop database should populate inputs/outputs and check concurrency and 
 user permission
 -

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: Bug
  Components: Authorization, Locking, Metastore, Security
Reporter: Namit Jain
Assignee: Navis
 Fix For: 0.13.0

 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, 
 HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, 
 HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, 
 HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch


 concurrency and authorization are needed for create/drop table. Also to make 
 concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
 DATABASE



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-2093:


Issue Type: New Feature  (was: Bug)

 create/drop database should populate inputs/outputs and check concurrency and 
 user permission
 -

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: New Feature
  Components: Authorization, Locking, Metastore, Security
Reporter: Namit Jain
Assignee: Navis
 Fix For: 0.13.0

 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, 
 HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, 
 HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, 
 HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch


 concurrency and authorization are needed for create/drop table. Also to make 
 concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
 DATABASE



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5555) Support alternate join syntax: joining conditions in where clause; also pushdown qualifying join conditions

2013-12-12 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-:


Issue Type: New Feature  (was: Bug)

 Support alternate join syntax: joining conditions in where clause; also 
 pushdown qualifying join conditions 
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: AlternativeJoinSyntax.pdf


 Certain tools still generate `old style' Join queries where the join
 condition is in the Where clause. A related set of issues that can
 be addressed is that of pushing forward joining conditions;
 in a manner similar to the Predicate Pushdown feature of Hive.
 For e.g. these queries can have join conditions pushed down:
 {noformat}
 - query 1, push join predicate from 2nd join to 1st
 explain select *
 from part p1 join part p2 join part p3 on p1.p_name = p2.p_name and p2.p_name 
 = p3.p_name;
 - query 2
 explain select *
 from part p1 join part p2 join part p3 
 where p1.p_name = p2.p_name and p2.p_name = p3.p_name;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6027) non-vectorized log10 has rounding issue

2013-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846996#comment-13846996
 ] 

Hive QA commented on HIVE-6027:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618494/HIVE-6027.patch

{color:green}SUCCESS:{color} +1 4779 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/625/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/625/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618494

 non-vectorized log10 has rounding issue
 ---

 Key: HIVE-6027
 URL: https://issues.apache.org/jira/browse/HIVE-6027
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HIVE-6027.patch


 In HIVE-6010, I found that vectorized and non-vectorized log10 may produce 
 different results in the last digit of the mantissa (e.g. 7 vs 8). It turns 
 out that vectorized one uses Math.log10, but non-vectorized uses log/log(10). 
 Both should use Math.log10.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: Review Request 16167: HIVE-5595 Implement Vectorized SMB Join

2013-12-12 Thread Eric Hanson

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16167/#review30289
---



ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment57976

Please remove trailing white space in all your code. You can set the 
eclipse editor to do this.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment57980

Can you add a comment about the purpose of this class and the major 
differences from regular SMB Join?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment57985

Excellent variable names



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment57984

Please add a few comments in the body explaining the major sections 



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment57983

Does the fact that this is a map from Byte mean there is a limit of 127 
ANDed filter expressions? I guess that is enough for most purposes but it seems 
like a like internal limit. Not sure if this is a limitation inherited from 
someplace else.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment57987

need blank after =



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment57989

Sun Java coding standards say put blanks around =,  



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment57988

and replacing them



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment57993

please put blanks around :



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment57998

good comment!

spell out atm



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment58020

I don't understand this because the body of the loop does not change for 
each trip through the loop. It looks like you are doing the same thing 
inBatch.size times. Is this right? If so, please explain.

Should tag be batchIndex?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment58021

Please add comment before method



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
https://reviews.apache.org/r/16167/#comment58022

add blanks around operators



ql/src/test/org/apache/hadoop/hive/ql/optimizer/physical/TestVectorizer.java
https://reviews.apache.org/r/16167/#comment58023

Please comment the tests to explain what you are checking


- Eric Hanson


On Dec. 11, 2013, 7:26 a.m., Remus Rusanu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/16167/
 ---
 
 (Updated Dec. 11, 2013, 7:26 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan, Eric Hanson, and Jitendra Pandey.
 
 
 Bugs: HIVE-5595
 https://issues.apache.org/jira/browse/HIVE-5595
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 See HIVE-5595 I will post description
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 24a812d 
   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 81a1232 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 19f7d79 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CommonRCFileInputFormat.java 
 4bfeb20 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java abdc165 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 7859e56 
   
 ql/src/test/org/apache/hadoop/hive/ql/optimizer/physical/TestVectorizer.java 
 02031ea 
   ql/src/test/queries/clientpositive/vectorized_bucketmapjoin1.q PRE-CREATION 
   ql/src/test/results/clientpositive/vectorized_bucketmapjoin1.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/16167/diff/
 
 
 Testing
 ---
 
 New .q file, manually tested several cases
 
 
 Thanks,
 
 Remus Rusanu
 




[jira] [Commented] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution

2013-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847002#comment-13847002
 ] 

Hive QA commented on HIVE-6010:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618503/HIVE-6010.patch

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/626/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/626/console

Messages:
{noformat}
 This message was trimmed, see log for full details 
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2399,49]
 cannot find symbol
symbol  : class UnlockDatabaseDesc
location: class org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2431,35]
 reference to DDLWork is ambiguous, both method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork and method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork match
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2443,35]
 reference to DDLWork is ambiguous, both method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork and method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork match
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2493,35]
 reference to DDLWork is ambiguous, both method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork and method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork match
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2513,35]
 reference to DDLWork is ambiguous, both method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork and method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork match
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2529,35]
 reference to DDLWork is ambiguous, both method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork and method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork match
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2541,35]
 reference to DDLWork is ambiguous, both method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork and method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork match
[ERROR] 
/data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:[2589,35]
 reference to DDLWork is ambiguous, both method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,LockDatabaseDesc)
 in org.apache.hadoop.hive.ql.plan.DDLWork and method 
DDLWork(java.util.HashSetorg.apache.hadoop.hive.ql.hooks.ReadEntity,java.util.HashSetorg.apache.hadoop.hive.ql.hooks.WriteEntity,UnlockDatabaseDesc)
 in 

[jira] [Commented] (HIVE-5595) Implement vectorized SMB JOIN

2013-12-12 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847003#comment-13847003
 ] 

Eric Hanson commented on HIVE-5595:
---

Hi Remus,

Overall this looks good! Please see my comments on ReviewBoard.

Eric

 Implement vectorized SMB JOIN
 -

 Key: HIVE-5595
 URL: https://issues.apache.org/jira/browse/HIVE-5595
 Project: Hive
  Issue Type: Sub-task
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Critical
 Attachments: HIVE-5595.1.patch, HIVE-5595.2.patch

   Original Estimate: 168h
  Remaining Estimate: 168h





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-6027) non-vectorized log10 has rounding issue

2013-12-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847071#comment-13847071
 ] 

Ashutosh Chauhan commented on HIVE-6027:


+1

 non-vectorized log10 has rounding issue
 ---

 Key: HIVE-6027
 URL: https://issues.apache.org/jira/browse/HIVE-6027
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HIVE-6027.patch


 In HIVE-6010, I found that vectorized and non-vectorized log10 may produce 
 different results in the last digit of the mantissa (e.g. 7 vs 8). It turns 
 out that vectorized one uses Math.log10, but non-vectorized uses log/log(10). 
 Both should use Math.log10.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5924) Save operation logs in per operation directories in HiveServer2

2013-12-12 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847078#comment-13847078
 ] 

Vaibhav Gumashta commented on HIVE-5924:


Thanks [~jaideepdhok]. Couple of questions:
1. Would enabling the per session/operation log config mean that there will be 
no consolidated log?
2. Regarding 6.), there is an open JIRA - 
[HIVE-5268|https://issues.apache.org/jira/browse/HIVE-5268] which has some 
overlap. There is also a different approach taken here 
[HIVE-5799|https://issues.apache.org/jira/browse/HIVE-5799], which is being 
discussed. I'd be curious to hear what your method of detecting abandoned 
sessions is.

Look forward to the patch. Thanks!

 Save operation logs in per operation directories in HiveServer2
 ---

 Key: HIVE-5924
 URL: https://issues.apache.org/jira/browse/HIVE-5924
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Jaideep Dhok
Assignee: Jaideep Dhok





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5924) Save operation logs in per operation directories in HiveServer2

2013-12-12 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-5924:
---

Affects Version/s: 0.13.0

 Save operation logs in per operation directories in HiveServer2
 ---

 Key: HIVE-5924
 URL: https://issues.apache.org/jira/browse/HIVE-5924
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Jaideep Dhok
Assignee: Jaideep Dhok





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Reopened] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-12 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reopened HIVE-2093:
--


 create/drop database should populate inputs/outputs and check concurrency and 
 user permission
 -

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: New Feature
  Components: Authorization, Locking, Metastore, Security
Reporter: Namit Jain
Assignee: Navis
 Fix For: 0.13.0

 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, 
 HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, 
 HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, 
 HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch


 concurrency and authorization are needed for create/drop table. Also to make 
 concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
 DATABASE



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-12 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847104#comment-13847104
 ] 

Gunther Hagleitner commented on HIVE-2093:
--

[~thejas] This is breaking the build. I think you might have forgotten to add 
some files (UnlockDatabaseDesc/LockDatabaseDesc)?

 create/drop database should populate inputs/outputs and check concurrency and 
 user permission
 -

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: New Feature
  Components: Authorization, Locking, Metastore, Security
Reporter: Namit Jain
Assignee: Navis
 Fix For: 0.13.0

 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, 
 HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, 
 HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, 
 HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch


 concurrency and authorization are needed for create/drop table. Also to make 
 concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
 DATABASE



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-12 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847110#comment-13847110
 ] 

Gunther Hagleitner commented on HIVE-2093:
--

I think it's just the two files. I will commit those (from patch .9)

 create/drop database should populate inputs/outputs and check concurrency and 
 user permission
 -

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: New Feature
  Components: Authorization, Locking, Metastore, Security
Reporter: Namit Jain
Assignee: Navis
 Fix For: 0.13.0

 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, 
 HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, 
 HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, 
 HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch


 concurrency and authorization are needed for create/drop table. Also to make 
 concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
 DATABASE



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-12 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847113#comment-13847113
 ] 

Gunther Hagleitner commented on HIVE-2093:
--

Committed UnlockDatabaseDesc and LockDatabaseDesc. Build is working again for 
me.

 create/drop database should populate inputs/outputs and check concurrency and 
 user permission
 -

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: New Feature
  Components: Authorization, Locking, Metastore, Security
Reporter: Namit Jain
Assignee: Navis
 Fix For: 0.13.0

 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, 
 HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, 
 HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, 
 HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch


 concurrency and authorization are needed for create/drop table. Also to make 
 concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
 DATABASE



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-12 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-2093.
--

Resolution: Fixed

 create/drop database should populate inputs/outputs and check concurrency and 
 user permission
 -

 Key: HIVE-2093
 URL: https://issues.apache.org/jira/browse/HIVE-2093
 Project: Hive
  Issue Type: New Feature
  Components: Authorization, Locking, Metastore, Security
Reporter: Namit Jain
Assignee: Navis
 Fix For: 0.13.0

 Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, 
 HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, 
 HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, 
 HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch


 concurrency and authorization are needed for create/drop table. Also to make 
 concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
 DATABASE



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6000) Hive build broken on hadoop2

2013-12-12 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6000:
-

Fix Version/s: 0.13.0

 Hive build broken on hadoop2
 

 Key: HIVE-6000
 URL: https://issues.apache.org/jira/browse/HIVE-6000
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Vikram Dixit K
Priority: Blocker
 Fix For: 0.13.0

 Attachments: HIVE-6000.1.patch


 When I build on hadoop2 since yesterday, I get
 {noformat}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
 (default-testCompile) on project hive-it-unit: Compilation failure: 
 Compilation failure:
 [ERROR] 
 /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[28,41]
  package org.apache.hadoop.hbase.zookeeper does not exist
 [ERROR] 
 /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[40,11]
  cannot find symbol
 [ERROR] symbol  : class MiniZooKeeperCluster
 [ERROR] location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
 [ERROR] 
 /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
  cannot find symbol
 [ERROR] symbol  : class MiniZooKeeperCluster
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-6000) Hive build broken on hadoop2

2013-12-12 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6000:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Vikram!

 Hive build broken on hadoop2
 

 Key: HIVE-6000
 URL: https://issues.apache.org/jira/browse/HIVE-6000
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Vikram Dixit K
Priority: Blocker
 Attachments: HIVE-6000.1.patch


 When I build on hadoop2 since yesterday, I get
 {noformat}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
 (default-testCompile) on project hive-it-unit: Compilation failure: 
 Compilation failure:
 [ERROR] 
 /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[28,41]
  package org.apache.hadoop.hbase.zookeeper does not exist
 [ERROR] 
 /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[40,11]
  cannot find symbol
 [ERROR] symbol  : class MiniZooKeeperCluster
 [ERROR] location: class org.apache.hadoop.hive.thrift.TestZooKeeperTokenStore
 [ERROR] 
 /Users/sergey/git/hive/itests/hive-unit/src/test/java/org/apache/hadoop/hive/thrift/TestZooKeeperTokenStore.java:[53,26]
  cannot find symbol
 [ERROR] symbol  : class MiniZooKeeperCluster
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Re: Review Request 16207: HIVE-1466: Add NULL DEFINED AS to ROW FORMAT specification

2013-12-12 Thread Prasad Mujumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16207/
---

(Updated Dec. 13, 2013, 3:35 a.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Fixed formatting (tabs)
Fixed 'show create table' to support null format
Added testcase for CTAS


Bugs: HIVE-1466
https://issues.apache.org/jira/browse/HIVE-1466


Repository: hive-git


Description
---

Support configurable null format for tables and writing out to directory.
Using a non-default null format is a bit while creating a table, and pretty 
much impossible to export the data to local filesystem using insert overwrite 
directory.
The patch enhances the SQL syntax to support 'NULL DEFINED AS' construct for 
create table as well as insert overwrite directory.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 41df473 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java fdc0d1a 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 366b714 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g b146df6 
  ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 8cf5ad6 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ace1df9 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 93b4181 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java b6097b1 
  ql/src/test/queries/clientpositive/nullformat.q PRE-CREATION 
  ql/src/test/queries/clientpositive/nullformatCTAS.q PRE-CREATION 
  ql/src/test/queries/clientpositive/nullformatdir.q PRE-CREATION 
  ql/src/test/results/clientpositive/nullformat.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/nullformatCTAS.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/nullformatdir.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/16207/diff/


Testing
---

Added new tests.


Thanks,

Prasad Mujumdar



[jira] [Updated] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification

2013-12-12 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-1466:
--

Attachment: HIVE-1466.2.patch

Addressed review comments, added more test cases

 Add NULL DEFINED AS to ROW FORMAT specification
 ---

 Key: HIVE-1466
 URL: https://issues.apache.org/jira/browse/HIVE-1466
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer
Assignee: Prasad Mujumdar
 Attachments: HIVE-1466.1.patch, HIVE-1466.2.patch


 NULL values are passed to transformers as a literal backslash and a literal 
 N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. 
 This is inconsistent.
 The ROW FORMAT specification of tables should be able to specify the manner 
 in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or 
 '\003' or whatever should apply to all instances of table export and saving.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-12 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5996:
--

Status: Open  (was: Patch Available)

 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847150#comment-13847150
 ] 

Xuefu Zhang commented on HIVE-5996:
---

{quote}
It would be great if you can document the theory, otherwise I still would need 
to look at code to understand the theory
{quote}

I will put it somewhere on the wiki.

{quote}
You can have an overflow even with decimal type, if they are large enough, with 
just two rows.
{quote}

It's impossible to overflow output decimal type with just two rows because the 
precision of the output decimal type is 10 + the precision of the input type. 
In case of long input, the output decimal type is (29,0).


 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5924) Save operation logs in per operation directories in HiveServer2

2013-12-12 Thread Jaideep Dhok (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847153#comment-13847153
 ] 

Jaideep Dhok commented on HIVE-5924:


bq.  we can close the session
We will not actually close the session, just delete the log files.

 Save operation logs in per operation directories in HiveServer2
 ---

 Key: HIVE-5924
 URL: https://issues.apache.org/jira/browse/HIVE-5924
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Jaideep Dhok
Assignee: Jaideep Dhok





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5924) Save operation logs in per operation directories in HiveServer2

2013-12-12 Thread Jaideep Dhok (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847151#comment-13847151
 ] 

Jaideep Dhok commented on HIVE-5924:


[~vgumashta] Thanks for looking at the issue. 

bq. 1. Would enabling the per session/operation log config mean that there will 
be no consolidated log?
HiveServer2 logs like session open, session close etc will continue to be 
consolidated. Only the query logs like job client logs, driver or task logs 
will be redirected. Turning off the log redirection would again consolidate 
everything into a single log file as is done currently.

bq. I'd be curious to hear what your method of detecting abandoned sessions is.
For detecting abandoned sessions w.r.t. log purging, I can check the last 
modified time of an operation log file. If that is older than a configured 
value, we can close the session.

 Save operation logs in per operation directories in HiveServer2
 ---

 Key: HIVE-5924
 URL: https://issues.apache.org/jira/browse/HIVE-5924
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Jaideep Dhok
Assignee: Jaideep Dhok





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Hive-trunk-hadoop2 - Build # 594 - Still Failing

2013-12-12 Thread Apache Jenkins Server
Changes for Build #560
[xuefu] HIVE-5356: Move arithmatic UDFs to generic UDF implementations 
(reviewed by Brock)

[hashutosh] HIVE-5846 : Analyze command fails with vectorization on (Remus 
Rusanu via Ashutosh Chauhan)

[hashutosh] HIVE-2055 : Hive should add HBase classpath dependencies when 
available (Nick Dimiduk via Ashutosh Chauhan)

[hashutosh] HIVE-4632 : Use hadoop counter as a stat publisher (Navis via 
Ashutosh Chauhan)


Changes for Build #561
[hashutosh] HIVE-5845 : CTAS failed on vectorized code path (Remus Rusanu via 
Ashutosh Chauhan)

[thejas] HIVE-5635 : WebHCatJTShim23 ignores security/user context (Eugene 
Koifman via Thejas Nair)

[hashutosh] HIVE-5663 : Refactor ORC RecordReader to operate on direct  
wrapped ByteBuffers (Gopal V via Owen Omalley)

[xuefu] HIVE-5565: Limit Hive decimal type maximum precision and scale to 38 
(reviewed by Brock)

[brock] HIVE-5842 - Fix issues with new paths to jar in hcatalog (Brock Noland 
reviewed by Prasad Mujumdar)


Changes for Build #562
[hashutosh] HIVE-5692 : Make VectorGroupByOperator parameters configurable 
(Remus Rusanu via Ashutosh Chauhan)


Changes for Build #563
[thejas] HIVE-5618 : Hive local task fails to run when run from oozie in a 
secure cluster (Prasad Mujumdar via Thejas Nair)


Changes for Build #564

Changes for Build #565
[thejas] HIVE-3815 : hive table rename fails if filesystem cache is disabled 
(Thejas Nair reviewed by Navis)


Changes for Build #566

Changes for Build #567
[hashutosh] HIVE-5614 : Subquery support: allow subquery expressions in having 
clause (Harish Butani via Ashutosh Chauhan)


Changes for Build #568
[xuefu] HIVE-5763: ExprNodeGenericFuncDesc.toString() generating unbalanced 
parenthesises (reviewed by Ashutosh)


Changes for Build #569

Changes for Build #570
[rhbutani] HIVE-5849 Improve the stats of operators based on heuristics in the 
absence of any column statistics (Prasanth Jayachandran via Harish Butani)

[hashutosh] HIVE-5793 : Update hive-default.xml.template for HIVE4002 (Navis 
via Ashutosh Chauhan)


Changes for Build #571
[navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.)

[navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu 
Zhang via Navis)

[navis] HIVE-4518 : Missing file (HiveFatalException)

[navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and 
Jason Dere via Navis)


Changes for Build #572
[brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad 
Mujumdar, Navis via Brock Noland)


Changes for Build #573
[navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K 
and Szehon Ho via Navis)

[thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed 
by Ashutosh Chauhan)

[brock] HIVE-5704 - A couple of generic UDFs are not in the right 
folder/package (Xuefu Zhang via Brock Noland)

[brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu 
Zhang via Brock Noland)

[hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is 
broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables 
(Prasanth J via Ashutosh Chauhan)

[hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback 
(Ashutosh Chauhan via Thejas Nair)

[brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland)


Changes for Build #574
[brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit 
K via Brock Noland)


Changes for Build #575
[xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to 
nonexistent column (Carl via Xuefu)

[xuefu] HIVE-5684: Serde support for char (Jason via Xuefu)


Changes for Build #576

Changes for Build #577

Changes for Build #578

Changes for Build #579
[brock] HIVE-5441 - Async query execution doesn't return resultset status 
(Prasad Mujumdar via Thejas M Nair)

[brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock 
Noland reviewed by Prasad Mujumdar)


Changes for Build #580
[ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string 
arguments (Teddy Choi via Eric Hanson)


Changes for Build #581
[rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth 
Jayachandran via Harish Butani)


Changes for Build #582
[brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks 
packaging (Xuefu Zhang via Brock Noland)


Changes for Build #583
[xuefu] HIVE-5866: Hive divide operator generates wrong results in certain 
cases (reviewed by Prasad)

[ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued 
expression (Eric Hanson)


Changes for Build #584
[thejas] HIVE-5550 : Import fails for tables created with default text, 
sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas 
Nair)

[ehans] HIVE-5895: vectorization handles division by zero differently from 
normal execution (Sergey 

Hive-trunk-h0.21 - Build # 2496 - Still Failing

2013-12-12 Thread Apache Jenkins Server
Changes for Build #2461
[xuefu] HIVE-5565: Limit Hive decimal type maximum precision and scale to 38 
(reviewed by Brock)

[brock] HIVE-5842 - Fix issues with new paths to jar in hcatalog (Brock Noland 
reviewed by Prasad Mujumdar)

[xuefu] HIVE-5356: Move arithmatic UDFs to generic UDF implementations 
(reviewed by Brock)


Changes for Build #2462
[hashutosh] HIVE-5692 : Make VectorGroupByOperator parameters configurable 
(Remus Rusanu via Ashutosh Chauhan)

[hashutosh] HIVE-5845 : CTAS failed on vectorized code path (Remus Rusanu via 
Ashutosh Chauhan)

[thejas] HIVE-5635 : WebHCatJTShim23 ignores security/user context (Eugene 
Koifman via Thejas Nair)

[hashutosh] HIVE-5663 : Refactor ORC RecordReader to operate on direct  
wrapped ByteBuffers (Gopal V via Owen Omalley)


Changes for Build #2463

Changes for Build #2464
[thejas] HIVE-5618 : Hive local task fails to run when run from oozie in a 
secure cluster (Prasad Mujumdar via Thejas Nair)


Changes for Build #2465

Changes for Build #2466
[thejas] HIVE-3815 : hive table rename fails if filesystem cache is disabled 
(Thejas Nair reviewed by Navis)


Changes for Build #2467

Changes for Build #2468
[hashutosh] HIVE-5614 : Subquery support: allow subquery expressions in having 
clause (Harish Butani via Ashutosh Chauhan)


Changes for Build #2469
[xuefu] HIVE-5763: ExprNodeGenericFuncDesc.toString() generating unbalanced 
parenthesises (reviewed by Ashutosh)


Changes for Build #2470

Changes for Build #2471
[rhbutani] HIVE-5849 Improve the stats of operators based on heuristics in the 
absence of any column statistics (Prasanth Jayachandran via Harish Butani)

[hashutosh] HIVE-5793 : Update hive-default.xml.template for HIVE4002 (Navis 
via Ashutosh Chauhan)


Changes for Build #2472
[navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.)

[navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu 
Zhang via Navis)

[navis] HIVE-4518 : Missing file (HiveFatalException)

[navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and 
Jason Dere via Navis)


Changes for Build #2473
[brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad 
Mujumdar, Navis via Brock Noland)


Changes for Build #2474
[navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K 
and Szehon Ho via Navis)

[thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed 
by Ashutosh Chauhan)

[brock] HIVE-5704 - A couple of generic UDFs are not in the right 
folder/package (Xuefu Zhang via Brock Noland)

[brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu 
Zhang via Brock Noland)

[hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is 
broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables 
(Prasanth J via Ashutosh Chauhan)

[hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback 
(Ashutosh Chauhan via Thejas Nair)

[brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland)


Changes for Build #2475
[brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit 
K via Brock Noland)


Changes for Build #2476
[xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to 
nonexistent column (Carl via Xuefu)

[xuefu] HIVE-5684: Serde support for char (Jason via Xuefu)


Changes for Build #2477

Changes for Build #2478

Changes for Build #2479

Changes for Build #2480
[brock] HIVE-5441 - Async query execution doesn't return resultset status 
(Prasad Mujumdar via Thejas M Nair)

[brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock 
Noland reviewed by Prasad Mujumdar)


Changes for Build #2481

Changes for Build #2482
[ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string 
arguments (Teddy Choi via Eric Hanson)


Changes for Build #2483
[rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth 
Jayachandran via Harish Butani)


Changes for Build #2484
[brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks 
packaging (Xuefu Zhang via Brock Noland)


Changes for Build #2485
[xuefu] HIVE-5866: Hive divide operator generates wrong results in certain 
cases (reviewed by Prasad)

[ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued 
expression (Eric Hanson)


Changes for Build #2486
[ehans] HIVE-5895: vectorization handles division by zero differently from 
normal execution (Sergey Shelukhin via Eric Hanson)

[hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via 
Ashutosh Chauhan)

[xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via 
Xuefu)

[brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles 
(Szehon Ho via Brock Noland)

[brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock 
Noland reviewed by Navis)


[jira] [Commented] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification

2013-12-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847166#comment-13847166
 ] 

Hive QA commented on HIVE-1466:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618535/HIVE-1466.2.patch

{color:green}SUCCESS:{color} +1 4788 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/627/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/627/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618535

 Add NULL DEFINED AS to ROW FORMAT specification
 ---

 Key: HIVE-1466
 URL: https://issues.apache.org/jira/browse/HIVE-1466
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer
Assignee: Prasad Mujumdar
 Attachments: HIVE-1466.1.patch, HIVE-1466.2.patch


 NULL values are passed to transformers as a literal backslash and a literal 
 N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. 
 This is inconsistent.
 The ROW FORMAT specification of tables should be able to specify the manner 
 in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or 
 '\003' or whatever should apply to all instances of table export and saving.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


Review Request 16239: HIVE-6022 Load statements with incorrect order of partitions put input files to unreadable places

2013-12-12 Thread Teruyoshi Zenmyo

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16239/
---

Review request for hive.


Bugs: HIVE-6022
https://issues.apache.org/jira/browse/HIVE-6022


Repository: hive-git


Description
---

HIVE-6022 Load statements with incorrect order of partitions put input files to 
unreadable places


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 4b7fc73 
  ql/src/test/queries/clientpositive/loadpart2.q PRE-CREATION 
  ql/src/test/results/clientpositive/loadpart2.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/16239/diff/


Testing
---


Thanks,

Teruyoshi Zenmyo



  1   2   >