[jira] [Created] (HIVE-5061) Row sampling throws NPE when used in sub-query

2013-08-12 Thread Navis (JIRA)
Navis created HIVE-5061:
---

 Summary: Row sampling throws NPE when used in sub-query
 Key: HIVE-5061
 URL: https://issues.apache.org/jira/browse/HIVE-5061
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor


select * from (select * from src TABLESAMPLE (1 ROWS)) x;

{noformat}
ava.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.SplitSample.getTargetSize(SplitSample.java:103)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.sampleSplits(CombineHiveInputFormat.java:487)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:405)
at 
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1025)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1017)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:928)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:881)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:881)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:855)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1424)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1204)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:878)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5061) Row sampling throws NPE when used in sub-query

2013-08-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5061:


Status: Patch Available  (was: Open)

 Row sampling throws NPE when used in sub-query
 --

 Key: HIVE-5061
 URL: https://issues.apache.org/jira/browse/HIVE-5061
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor

 select * from (select * from src TABLESAMPLE (1 ROWS)) x;
 {noformat}
 ava.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.parse.SplitSample.getTargetSize(SplitSample.java:103)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.sampleSplits(CombineHiveInputFormat.java:487)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:405)
   at 
 org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1025)
   at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1017)
   at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:928)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:881)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:881)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:855)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
   at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1424)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1204)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:878)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5061) Row sampling throws NPE when used in sub-query

2013-08-12 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5061:
--

Attachment: HIVE-5061.D12165.1.patch

navis requested code review of HIVE-5061 [jira] Row sampling throws NPE when 
used in sub-query.

Reviewers: JIRA

HIVE-5061 Row sampling throws NPE when used in sub-query

select * from (select * from src TABLESAMPLE (1 ROWS)) x;

ava.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.parse.SplitSample.getTargetSize(SplitSample.java:103)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.sampleSplits(CombineHiveInputFormat.java:487)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:405)
at 
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1025)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1017)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:928)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:881)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:881)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:855)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1424)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1204)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:878)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12165

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/test/queries/clientpositive/split_sample.q
  ql/src/test/results/clientpositive/split_sample.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/29061/

To: JIRA, navis


 Row sampling throws NPE when used in sub-query
 --

 Key: HIVE-5061
 URL: https://issues.apache.org/jira/browse/HIVE-5061
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5061.D12165.1.patch


 select * from (select * from src TABLESAMPLE (1 ROWS)) x;
 {noformat}
 ava.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.parse.SplitSample.getTargetSize(SplitSample.java:103)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.sampleSplits(CombineHiveInputFormat.java:487)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:405)
   at 
 org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1025)
   at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1017)
   at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:928)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:881)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:881)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:855)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
   at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
   at 
 

[jira] [Created] (HIVE-5062) Insert + orderby + limit does not need additional RS for limiting rows

2013-08-12 Thread Navis (JIRA)
Navis created HIVE-5062:
---

 Summary: Insert + orderby + limit does not need additional RS for 
limiting rows
 Key: HIVE-5062
 URL: https://issues.apache.org/jira/browse/HIVE-5062
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial


The query,

{noformat}
insert overwrite table dummy select * from src order by key limit 10;
{noformat}

runs two MR but single MR is enough.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5062) Insert + orderby + limit does not need additional RS for limiting rows

2013-08-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5062:


Status: Patch Available  (was: Open)

 Insert + orderby + limit does not need additional RS for limiting rows
 --

 Key: HIVE-5062
 URL: https://issues.apache.org/jira/browse/HIVE-5062
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-5062.D12171.1.patch


 The query,
 {noformat}
 insert overwrite table dummy select * from src order by key limit 10;
 {noformat}
 runs two MR but single MR is enough.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5062) Insert + orderby + limit does not need additional RS for limiting rows

2013-08-12 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5062:
--

Attachment: HIVE-5062.D12171.1.patch

navis requested code review of HIVE-5062 [jira] Insert + orderby + limit does 
not need additional RS for limiting rows.

Reviewers: JIRA

HIVE-5062 Insert + orderby + limit does not need additional RS for limiting rows

The query,

insert overwrite table dummy select * from src order by key limit 10;

runs two MR but single MR is enough.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12171

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/test/results/clientpositive/insert1_overwrite_partitions.q.out
  ql/src/test/results/clientpositive/insert2_overwrite_partitions.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/29067/

To: JIRA, navis


 Insert + orderby + limit does not need additional RS for limiting rows
 --

 Key: HIVE-5062
 URL: https://issues.apache.org/jira/browse/HIVE-5062
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-5062.D12171.1.patch


 The query,
 {noformat}
 insert overwrite table dummy select * from src order by key limit 10;
 {noformat}
 runs two MR but single MR is enough.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4513) disable hivehistory logs by default

2013-08-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736661#comment-13736661
 ] 

Thejas M Nair commented on HIVE-4513:
-

Regarding the pre-commit test result, 
testNegativeCliDriver_mapreduce_stack_trace_hadoop20 is a flaky test tracked in 
HIVE-4851.
Reviewboard has the latest patch.


 disable hivehistory logs by default
 ---

 Key: HIVE-4513
 URL: https://issues.apache.org/jira/browse/HIVE-4513
 Project: Hive
  Issue Type: Bug
  Components: Configuration, Logging
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4513.1.patch, HIVE-4513.2.patch, HIVE-4513.3.patch, 
 HIVE-4513.4.patch, HIVE-4513.5.patch, HIVE-4513.6.patch


 HiveHistory log files (hive_job_log_hive_*.txt files) store information about 
 hive query such as query string, plan , counters and MR job progress 
 information.
 There is no mechanism to delete these files and as a result they get 
 accumulated over time, using up lot of disk space. 
 I don't think this is used by most people, so I think it would better to turn 
 this off by default. Jobtracker logs already capture most of this 
 information, though it is not as structured as history logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5009) Fix minor optimization issues

2013-08-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736667#comment-13736667
 ] 

Thejas M Nair commented on HIVE-5009:
-

Based on the comments in HIVE-3739, it might be related to your jdk version. If 
you are using jdk7, you might want to check if jdk6 helps. Please let us know 
on jira if you find a way out.

 Fix minor optimization issues
 -

 Key: HIVE-5009
 URL: https://issues.apache.org/jira/browse/HIVE-5009
 Project: Hive
  Issue Type: Improvement
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractBucketJoinProc.java

   Original Estimate: 48h
  Remaining Estimate: 48h

 I have found some minor optimization issues in the codebase, which I would 
 like to rectify and contribute. Specifically, these are:
 The optimizations that could be applied to Hive's code base are as follows:
 1. Use StringBuffer when appending strings - In 184 instances, the 
 concatination operator (+=) was used when appending strings. This is 
 inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
 should be used. 12 instances of this optimization can be applied to the 
 GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
 uses the + operator inside a loop, so does the column projection utilities 
 class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
 Tests showed that using the StringBuilder when appending strings is 57\% 
 faster than using the + operator (using the StringBuffer took 122 
 milliseconds whilst the + operator took 284 milliseconds). The reason as to 
 why using the StringBuffer class is preferred over using the + operator, is 
 because
 String third = first + second;
 gets compiled to:
 StringBuilder builder = new StringBuilder( first );
 builder.append( second );
 third = builder.toString();
 Therefore, when building complex strings, that, for example involve loops, 
 require many instantiations (and as discussed below, creating new objects 
 inside loops is inefficient).
 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
 is a more efficient at creating  creating lists from arrays than using loops 
 to manually iterate over the elements (using asList is computationally very 
 cheap, O(1), as it merely creates a wrapper object around the array; looping 
 through the list however has a complexity of O(n) since a new list is created 
 and every element in the array is added to this new list). As confirmed by 
 the experiment detailed in Appendix D, the Java compiler does not 
 automatically optimize and replace tight-loop copying with asList: the 
 loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
 instant. 
 Four instances of this optimization can be applied to Hive's codebase (two of 
 these should be applied to the Map-Join container - MapJoinRowContainer) - 
 lines 92 to 98:
  for (obj = other.first(); obj != null; obj = other.next()) {
   ArrayListObject ele = new ArrayList(obj.length);
   for (int i = 0; i  obj.length; i++) {
 ele.add(obj[i]);
   }
   list.add((Row) ele);
 }
 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
 could be avoided by simply using the provided static conversion methods. As 
 noted in the PMD documentation, using these avoids the cost of creating 
 objects that also need to be garbage-collected later.
 For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
 more efficient parseDouble method call:
 // Inefficient:
 Double percent = Double.valueOf(value).doubleValue();
 // To be replaced by:
 Double percent = Double.parseDouble(value);
 Our test case in Appendix D confirms this: converting 10,000 strings into 
 integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
 unnecessary wrapper object) took 119 on average; using parseInt() took only 
 38. Therefore creating even just one unnecessary wrapper object can make your 
 code up to 68% slower.
 4. Converting literals to strings using +  - Converting literals to strings 
 using +  is quite inefficient (see Appendix D) and should be done by 
 calling the toString() method instead: converting 1,000,000 integers to 
 strings using +  took, on average, 1340 milliseconds whilst using the 
 toString() method only required 1183 milliseconds (hence adding empty strings 
 takes nearly 12% more time). 
 89 instances of this using +  when converting literals were found in Hive's 
 codebase - one of these are found in the JoinUtil.
 5. Avoid manual copying of arrays - Instead of copying arrays as is done in 
 GroupByOperator on line 1040 (see below), the more efficient System.arraycopy 
 can 

[jira] [Commented] (HIVE-5059) Meaningless warning message from TypeCheckProcFactory

2013-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736672#comment-13736672
 ] 

Hive QA commented on HIVE-5059:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597406/HIVE-5059.D12159.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2789 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.metastore.TestMetaStoreAuthorization.testMetaStoreAuthorization
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/398/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/398/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Meaningless warning message from TypeCheckProcFactory
 -

 Key: HIVE-5059
 URL: https://issues.apache.org/jira/browse/HIVE-5059
 Project: Hive
  Issue Type: Task
  Components: Logging
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-5059.D12159.1.patch


 Regression from HIVE-3849, hive logs meaningless messages as warning like 
 below,
 {noformat}
 WARN parse.TypeCheckProcFactory (TypeCheckProcFactory.java:convert(180)) - 
 Invalid type entry TOK_TABLE_OR_COL=null
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5063) Fix some non-deterministic or not-updated tests

2013-08-12 Thread Navis (JIRA)
Navis created HIVE-5063:
---

 Summary: Fix some non-deterministic or not-updated tests
 Key: HIVE-5063
 URL: https://issues.apache.org/jira/browse/HIVE-5063
 Project: Hive
  Issue Type: Sub-task
  Components: Tests
Reporter: Navis
Assignee: Navis
Priority: Minor


update result
auto_join14.q,input12.q,join14.q,union_remove_19.q

fix non-determinisitcs
partition_date.q,partition_date2.q,ppd_vc.q,nonblock_op_deduplicate.q


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5063) Fix some non-deterministic or not-updated tests

2013-08-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5063:


Status: Patch Available  (was: Open)

 Fix some non-deterministic or not-updated tests
 ---

 Key: HIVE-5063
 URL: https://issues.apache.org/jira/browse/HIVE-5063
 Project: Hive
  Issue Type: Sub-task
  Components: Tests
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5063.D12177.1.patch


 update result
 auto_join14.q,input12.q,join14.q,union_remove_19.q
 fix non-determinisitcs
 partition_date.q,partition_date2.q,ppd_vc.q,nonblock_op_deduplicate.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5063) Fix some non-deterministic or not-updated tests

2013-08-12 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5063:
--

Attachment: HIVE-5063.D12177.1.patch

navis requested code review of HIVE-5063 [jira] Fix some non-deterministic or 
not-updated tests.

Reviewers: JIRA

DPAL-2107

update result
auto_join14.q,input12.q,join14.q,union_remove_19.q

fix non-determinisitcs
partition_date.q,partition_date2.q,ppd_vc.q,nonblock_op_deduplicate.q

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12177

AFFECTED FILES
  ql/src/test/queries/clientpositive/nonblock_op_deduplicate.q
  ql/src/test/queries/clientpositive/partition_date.q
  ql/src/test/queries/clientpositive/partition_date2.q
  ql/src/test/queries/clientpositive/ppd_vc.q
  ql/src/test/results/clientpositive/auto_join14.q.out
  ql/src/test/results/clientpositive/input12.q.out
  ql/src/test/results/clientpositive/join14.q.out
  ql/src/test/results/clientpositive/nonblock_op_deduplicate.q.out
  ql/src/test/results/clientpositive/partition_date.q.out
  ql/src/test/results/clientpositive/partition_date2.q.out
  ql/src/test/results/clientpositive/ppd_vc.q.out
  ql/src/test/results/clientpositive/union_remove_19.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/29073/

To: JIRA, navis


 Fix some non-deterministic or not-updated tests
 ---

 Key: HIVE-5063
 URL: https://issues.apache.org/jira/browse/HIVE-5063
 Project: Hive
  Issue Type: Sub-task
  Components: Tests
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5063.D12177.1.patch


 update result
 auto_join14.q,input12.q,join14.q,union_remove_19.q
 fix non-determinisitcs
 partition_date.q,partition_date2.q,ppd_vc.q,nonblock_op_deduplicate.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5009) Fix minor optimization issues

2013-08-12 Thread Benjamin Jakobus (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736707#comment-13736707
 ] 

Benjamin Jakobus commented on HIVE-5009:


Thanks. Yes, that did the trick!


 Fix minor optimization issues
 -

 Key: HIVE-5009
 URL: https://issues.apache.org/jira/browse/HIVE-5009
 Project: Hive
  Issue Type: Improvement
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractBucketJoinProc.java

   Original Estimate: 48h
  Remaining Estimate: 48h

 I have found some minor optimization issues in the codebase, which I would 
 like to rectify and contribute. Specifically, these are:
 The optimizations that could be applied to Hive's code base are as follows:
 1. Use StringBuffer when appending strings - In 184 instances, the 
 concatination operator (+=) was used when appending strings. This is 
 inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
 should be used. 12 instances of this optimization can be applied to the 
 GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
 uses the + operator inside a loop, so does the column projection utilities 
 class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
 Tests showed that using the StringBuilder when appending strings is 57\% 
 faster than using the + operator (using the StringBuffer took 122 
 milliseconds whilst the + operator took 284 milliseconds). The reason as to 
 why using the StringBuffer class is preferred over using the + operator, is 
 because
 String third = first + second;
 gets compiled to:
 StringBuilder builder = new StringBuilder( first );
 builder.append( second );
 third = builder.toString();
 Therefore, when building complex strings, that, for example involve loops, 
 require many instantiations (and as discussed below, creating new objects 
 inside loops is inefficient).
 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
 is a more efficient at creating  creating lists from arrays than using loops 
 to manually iterate over the elements (using asList is computationally very 
 cheap, O(1), as it merely creates a wrapper object around the array; looping 
 through the list however has a complexity of O(n) since a new list is created 
 and every element in the array is added to this new list). As confirmed by 
 the experiment detailed in Appendix D, the Java compiler does not 
 automatically optimize and replace tight-loop copying with asList: the 
 loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
 instant. 
 Four instances of this optimization can be applied to Hive's codebase (two of 
 these should be applied to the Map-Join container - MapJoinRowContainer) - 
 lines 92 to 98:
  for (obj = other.first(); obj != null; obj = other.next()) {
   ArrayListObject ele = new ArrayList(obj.length);
   for (int i = 0; i  obj.length; i++) {
 ele.add(obj[i]);
   }
   list.add((Row) ele);
 }
 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
 could be avoided by simply using the provided static conversion methods. As 
 noted in the PMD documentation, using these avoids the cost of creating 
 objects that also need to be garbage-collected later.
 For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
 more efficient parseDouble method call:
 // Inefficient:
 Double percent = Double.valueOf(value).doubleValue();
 // To be replaced by:
 Double percent = Double.parseDouble(value);
 Our test case in Appendix D confirms this: converting 10,000 strings into 
 integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
 unnecessary wrapper object) took 119 on average; using parseInt() took only 
 38. Therefore creating even just one unnecessary wrapper object can make your 
 code up to 68% slower.
 4. Converting literals to strings using +  - Converting literals to strings 
 using +  is quite inefficient (see Appendix D) and should be done by 
 calling the toString() method instead: converting 1,000,000 integers to 
 strings using +  took, on average, 1340 milliseconds whilst using the 
 toString() method only required 1183 milliseconds (hence adding empty strings 
 takes nearly 12% more time). 
 89 instances of this using +  when converting literals were found in Hive's 
 codebase - one of these are found in the JoinUtil.
 5. Avoid manual copying of arrays - Instead of copying arrays as is done in 
 GroupByOperator on line 1040 (see below), the more efficient System.arraycopy 
 can be used (arraycopy is a native method meaning that the entire memory 
 block is copied using memcpy or mmove).
 // Line 1040 of the GroupByOperator
 for 

[jira] [Commented] (HIVE-494) Select columns by index instead of name

2013-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736716#comment-13736716
 ] 

Hive QA commented on HIVE-494:
--



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597403/HIVE-494.D12153.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2790 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/399/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/399/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Select columns by index instead of name
 ---

 Key: HIVE-494
 URL: https://issues.apache.org/jira/browse/HIVE-494
 Project: Hive
  Issue Type: Wish
  Components: Clients, Query Processor
Reporter: Adam Kramer
Assignee: Navis
Priority: Minor
  Labels: SQL
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-494.D1641.1.patch, 
 HIVE-494.D12153.1.patch


 SELECT mytable[0], mytable[2] FROM some_table_name mytable;
 ...should return the first and third columns, respectively, from mytable 
 regardless of their column names.
 The need for names specifically is kind of silly when they just get 
 translated into numbers anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4945) Make RLIKE/REGEXP run end-to-end by updating VectorizationContext

2013-08-12 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-4945:
-

Attachment: HIVE-4945.1.patch.txt

Review request on https://reviews.apache.org/r/13494/

 Make RLIKE/REGEXP run end-to-end by updating VectorizationContext
 -

 Key: HIVE-4945
 URL: https://issues.apache.org/jira/browse/HIVE-4945
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Eric Hanson
 Attachments: HIVE-4945.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4945) Make RLIKE/REGEXP run end-to-end by updating VectorizationContext

2013-08-12 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736761#comment-13736761
 ] 

Teddy Choi commented on HIVE-4945:
--

While writing this code, I found that even if the query was not running with 
vectorization expressions, it returns the same .q.out file. So it makes hard to 
check that vectorization expressions were used or not.

There is a way to check it. If vectorized expressions were used, 
ExecDriver#job#getMapperClass() returns VectorExecMapper#class after calling 
ExecDriver#execute(). Otherwise, getMapperClass() returns ExecMapper#class.

 Make RLIKE/REGEXP run end-to-end by updating VectorizationContext
 -

 Key: HIVE-4945
 URL: https://issues.apache.org/jira/browse/HIVE-4945
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Eric Hanson
 Attachments: HIVE-4945.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5039) Support autoReconnect at JDBC

2013-08-12 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5039:
--

Attachment: HIVE-5039.D12183.1.patch

azrael requested code review of HIVE-5039 [jira] Support autoReconnect at 
JDBC.

Reviewers: JIRA

HIVE-5039 : Support autoReconnect at JDBC

If hiveServer2 is shutdown, connection is broken. Let the connection can 
reconnect automatically after hiveServer2 re-started.

jdbc:hive2://localhost:1/default?autoReconnect=true

TEST PLAN
  unit test and manual test

REVISION DETAIL
  https://reviews.facebook.net/D12183

AFFECTED FILES
  jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java
  jdbc/src/java/org/apache/hive/jdbc/HivePreparedStatement.java
  jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java
  jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2Connection.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/29079/

To: JIRA, azrael


 Support autoReconnect at JDBC 
 --

 Key: HIVE-5039
 URL: https://issues.apache.org/jira/browse/HIVE-5039
 Project: Hive
  Issue Type: New Feature
  Components: JDBC
Affects Versions: 0.11.0
Reporter: Azrael Park
Assignee: Azrael Park
Priority: Trivial
 Attachments: HIVE-5039.D12183.1.patch, HIVE-5039.patch


 If hiveServer2 is shutdown, connection is broken. Let the connection can 
 reconnect automatically after hiveServer2 re-started.
 {noformat}
 jdbc:hive2://localhost:1/default?autoReconnect=true
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736773#comment-13736773
 ] 

Hive QA commented on HIVE-4123:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597402/HIVE-4123.patch.txt

{color:green}SUCCESS:{color} +1 2848 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/400/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/400/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5009) Fix minor optimization issues

2013-08-12 Thread Benjamin Jakobus (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736792#comment-13736792
 ] 

Benjamin Jakobus commented on HIVE-5009:


Mhh, another silly question: my changes don't seem to take effect after 
compiling.
1) Edit file (e.g. add console.printInfo( DEBUG: exec time:  + ((end - 
offset) / 1000) ); )
2) ant -Dhadoop.version=1.2.1 clean package
3) Run test script. But no output written to log or console)

Any advice?

 Fix minor optimization issues
 -

 Key: HIVE-5009
 URL: https://issues.apache.org/jira/browse/HIVE-5009
 Project: Hive
  Issue Type: Improvement
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractBucketJoinProc.java

   Original Estimate: 48h
  Remaining Estimate: 48h

 I have found some minor optimization issues in the codebase, which I would 
 like to rectify and contribute. Specifically, these are:
 The optimizations that could be applied to Hive's code base are as follows:
 1. Use StringBuffer when appending strings - In 184 instances, the 
 concatination operator (+=) was used when appending strings. This is 
 inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
 should be used. 12 instances of this optimization can be applied to the 
 GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
 uses the + operator inside a loop, so does the column projection utilities 
 class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
 Tests showed that using the StringBuilder when appending strings is 57\% 
 faster than using the + operator (using the StringBuffer took 122 
 milliseconds whilst the + operator took 284 milliseconds). The reason as to 
 why using the StringBuffer class is preferred over using the + operator, is 
 because
 String third = first + second;
 gets compiled to:
 StringBuilder builder = new StringBuilder( first );
 builder.append( second );
 third = builder.toString();
 Therefore, when building complex strings, that, for example involve loops, 
 require many instantiations (and as discussed below, creating new objects 
 inside loops is inefficient).
 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
 is a more efficient at creating  creating lists from arrays than using loops 
 to manually iterate over the elements (using asList is computationally very 
 cheap, O(1), as it merely creates a wrapper object around the array; looping 
 through the list however has a complexity of O(n) since a new list is created 
 and every element in the array is added to this new list). As confirmed by 
 the experiment detailed in Appendix D, the Java compiler does not 
 automatically optimize and replace tight-loop copying with asList: the 
 loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
 instant. 
 Four instances of this optimization can be applied to Hive's codebase (two of 
 these should be applied to the Map-Join container - MapJoinRowContainer) - 
 lines 92 to 98:
  for (obj = other.first(); obj != null; obj = other.next()) {
   ArrayListObject ele = new ArrayList(obj.length);
   for (int i = 0; i  obj.length; i++) {
 ele.add(obj[i]);
   }
   list.add((Row) ele);
 }
 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
 could be avoided by simply using the provided static conversion methods. As 
 noted in the PMD documentation, using these avoids the cost of creating 
 objects that also need to be garbage-collected later.
 For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
 more efficient parseDouble method call:
 // Inefficient:
 Double percent = Double.valueOf(value).doubleValue();
 // To be replaced by:
 Double percent = Double.parseDouble(value);
 Our test case in Appendix D confirms this: converting 10,000 strings into 
 integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
 unnecessary wrapper object) took 119 on average; using parseInt() took only 
 38. Therefore creating even just one unnecessary wrapper object can make your 
 code up to 68% slower.
 4. Converting literals to strings using +  - Converting literals to strings 
 using +  is quite inefficient (see Appendix D) and should be done by 
 calling the toString() method instead: converting 1,000,000 integers to 
 strings using +  took, on average, 1340 milliseconds whilst using the 
 toString() method only required 1183 milliseconds (hence adding empty strings 
 takes nearly 12% more time). 
 89 instances of this using +  when converting literals were found in Hive's 
 codebase - one of these are found in the JoinUtil.
 5. Avoid manual copying of arrays - Instead of 

[jira] [Commented] (HIVE-5009) Fix minor optimization issues

2013-08-12 Thread Benjamin Jakobus (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736809#comment-13736809
 ] 

Benjamin Jakobus commented on HIVE-5009:


Never mind - resolved. Problem was me being an idiot.

 Fix minor optimization issues
 -

 Key: HIVE-5009
 URL: https://issues.apache.org/jira/browse/HIVE-5009
 Project: Hive
  Issue Type: Improvement
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractBucketJoinProc.java

   Original Estimate: 48h
  Remaining Estimate: 48h

 I have found some minor optimization issues in the codebase, which I would 
 like to rectify and contribute. Specifically, these are:
 The optimizations that could be applied to Hive's code base are as follows:
 1. Use StringBuffer when appending strings - In 184 instances, the 
 concatination operator (+=) was used when appending strings. This is 
 inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
 should be used. 12 instances of this optimization can be applied to the 
 GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
 uses the + operator inside a loop, so does the column projection utilities 
 class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
 Tests showed that using the StringBuilder when appending strings is 57\% 
 faster than using the + operator (using the StringBuffer took 122 
 milliseconds whilst the + operator took 284 milliseconds). The reason as to 
 why using the StringBuffer class is preferred over using the + operator, is 
 because
 String third = first + second;
 gets compiled to:
 StringBuilder builder = new StringBuilder( first );
 builder.append( second );
 third = builder.toString();
 Therefore, when building complex strings, that, for example involve loops, 
 require many instantiations (and as discussed below, creating new objects 
 inside loops is inefficient).
 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
 is a more efficient at creating  creating lists from arrays than using loops 
 to manually iterate over the elements (using asList is computationally very 
 cheap, O(1), as it merely creates a wrapper object around the array; looping 
 through the list however has a complexity of O(n) since a new list is created 
 and every element in the array is added to this new list). As confirmed by 
 the experiment detailed in Appendix D, the Java compiler does not 
 automatically optimize and replace tight-loop copying with asList: the 
 loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
 instant. 
 Four instances of this optimization can be applied to Hive's codebase (two of 
 these should be applied to the Map-Join container - MapJoinRowContainer) - 
 lines 92 to 98:
  for (obj = other.first(); obj != null; obj = other.next()) {
   ArrayListObject ele = new ArrayList(obj.length);
   for (int i = 0; i  obj.length; i++) {
 ele.add(obj[i]);
   }
   list.add((Row) ele);
 }
 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
 could be avoided by simply using the provided static conversion methods. As 
 noted in the PMD documentation, using these avoids the cost of creating 
 objects that also need to be garbage-collected later.
 For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
 more efficient parseDouble method call:
 // Inefficient:
 Double percent = Double.valueOf(value).doubleValue();
 // To be replaced by:
 Double percent = Double.parseDouble(value);
 Our test case in Appendix D confirms this: converting 10,000 strings into 
 integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
 unnecessary wrapper object) took 119 on average; using parseInt() took only 
 38. Therefore creating even just one unnecessary wrapper object can make your 
 code up to 68% slower.
 4. Converting literals to strings using +  - Converting literals to strings 
 using +  is quite inefficient (see Appendix D) and should be done by 
 calling the toString() method instead: converting 1,000,000 integers to 
 strings using +  took, on average, 1340 milliseconds whilst using the 
 toString() method only required 1183 milliseconds (hence adding empty strings 
 takes nearly 12% more time). 
 89 instances of this using +  when converting literals were found in Hive's 
 codebase - one of these are found in the JoinUtil.
 5. Avoid manual copying of arrays - Instead of copying arrays as is done in 
 GroupByOperator on line 1040 (see below), the more efficient System.arraycopy 
 can be used (arraycopy is a native method meaning that the entire memory 
 block is copied using memcpy or mmove).
 // Line 1040 of the 

[jira] [Commented] (HIVE-5009) Fix minor optimization issues

2013-08-12 Thread Benjamin Jakobus (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736811#comment-13736811
 ] 

Benjamin Jakobus commented on HIVE-5009:


However is there a faster way to compile - or do I need to rely on ivy, maven 
etc every time? 
ant -Dhadoop.version=1.2.1 clean package   takes about 3 minutes every time.

 Fix minor optimization issues
 -

 Key: HIVE-5009
 URL: https://issues.apache.org/jira/browse/HIVE-5009
 Project: Hive
  Issue Type: Improvement
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractBucketJoinProc.java

   Original Estimate: 48h
  Remaining Estimate: 48h

 I have found some minor optimization issues in the codebase, which I would 
 like to rectify and contribute. Specifically, these are:
 The optimizations that could be applied to Hive's code base are as follows:
 1. Use StringBuffer when appending strings - In 184 instances, the 
 concatination operator (+=) was used when appending strings. This is 
 inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
 should be used. 12 instances of this optimization can be applied to the 
 GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
 uses the + operator inside a loop, so does the column projection utilities 
 class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
 Tests showed that using the StringBuilder when appending strings is 57\% 
 faster than using the + operator (using the StringBuffer took 122 
 milliseconds whilst the + operator took 284 milliseconds). The reason as to 
 why using the StringBuffer class is preferred over using the + operator, is 
 because
 String third = first + second;
 gets compiled to:
 StringBuilder builder = new StringBuilder( first );
 builder.append( second );
 third = builder.toString();
 Therefore, when building complex strings, that, for example involve loops, 
 require many instantiations (and as discussed below, creating new objects 
 inside loops is inefficient).
 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
 is a more efficient at creating  creating lists from arrays than using loops 
 to manually iterate over the elements (using asList is computationally very 
 cheap, O(1), as it merely creates a wrapper object around the array; looping 
 through the list however has a complexity of O(n) since a new list is created 
 and every element in the array is added to this new list). As confirmed by 
 the experiment detailed in Appendix D, the Java compiler does not 
 automatically optimize and replace tight-loop copying with asList: the 
 loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
 instant. 
 Four instances of this optimization can be applied to Hive's codebase (two of 
 these should be applied to the Map-Join container - MapJoinRowContainer) - 
 lines 92 to 98:
  for (obj = other.first(); obj != null; obj = other.next()) {
   ArrayListObject ele = new ArrayList(obj.length);
   for (int i = 0; i  obj.length; i++) {
 ele.add(obj[i]);
   }
   list.add((Row) ele);
 }
 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
 could be avoided by simply using the provided static conversion methods. As 
 noted in the PMD documentation, using these avoids the cost of creating 
 objects that also need to be garbage-collected later.
 For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
 more efficient parseDouble method call:
 // Inefficient:
 Double percent = Double.valueOf(value).doubleValue();
 // To be replaced by:
 Double percent = Double.parseDouble(value);
 Our test case in Appendix D confirms this: converting 10,000 strings into 
 integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
 unnecessary wrapper object) took 119 on average; using parseInt() took only 
 38. Therefore creating even just one unnecessary wrapper object can make your 
 code up to 68% slower.
 4. Converting literals to strings using +  - Converting literals to strings 
 using +  is quite inefficient (see Appendix D) and should be done by 
 calling the toString() method instead: converting 1,000,000 integers to 
 strings using +  took, on average, 1340 milliseconds whilst using the 
 toString() method only required 1183 milliseconds (hence adding empty strings 
 takes nearly 12% more time). 
 89 instances of this using +  when converting literals were found in Hive's 
 codebase - one of these are found in the JoinUtil.
 5. Avoid manual copying of arrays - Instead of copying arrays as is done in 
 GroupByOperator on line 1040 (see below), the more efficient System.arraycopy 
 can be used 

[jira] [Commented] (HIVE-5061) Row sampling throws NPE when used in sub-query

2013-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736841#comment-13736841
 ] 

Hive QA commented on HIVE-5061:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597417/HIVE-5061.D12165.1.patch

{color:green}SUCCESS:{color} +1 2789 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/401/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/401/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Row sampling throws NPE when used in sub-query
 --

 Key: HIVE-5061
 URL: https://issues.apache.org/jira/browse/HIVE-5061
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5061.D12165.1.patch


 select * from (select * from src TABLESAMPLE (1 ROWS)) x;
 {noformat}
 ava.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.parse.SplitSample.getTargetSize(SplitSample.java:103)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.sampleSplits(CombineHiveInputFormat.java:487)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:405)
   at 
 org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1025)
   at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1017)
   at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:928)
   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:881)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:881)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:855)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
   at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1424)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1204)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:878)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4979) If any compiling error exists, test-shims should stop

2013-08-12 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4979:
---

Status: Open  (was: Patch Available)

 If any compiling error exists, test-shims should stop
 -

 Key: HIVE-4979
 URL: https://issues.apache.org/jira/browse/HIVE-4979
 Project: Hive
  Issue Type: Sub-task
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4979.4980.failedTest.txt, HIVE-4979.D11931.1.patch, 
 HIVE-4979.D11931.2.patch, HIVE-4979.D11931.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2

2013-08-12 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736885#comment-13736885
 ] 

Henry Robinson commented on HIVE-4569:
--

Although {{executeStatement}} is implemented synchronously in Hive, was it 
meant to be synchronous from the outset? The comment in the Thrift definition 
suggests otherwise:

{code}
// ExecuteStatement()
//
// Execute a statement.
// The returned OperationHandle can be used to check on the
// status of the statement, and to fetch results once the
// statement has finished executing.
{code}



 GetQueryPlan api in Hive Server2
 

 Key: HIVE-4569
 URL: https://issues.apache.org/jira/browse/HIVE-4569
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Amareshwari Sriramadasu
Assignee: Jaideep Dhok
 Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, 
 HIVE-4569.D11469.1.patch


 It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan 
 api available in HiveServer2, though the wiki 
 https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API 
 contains, not sure why it was not added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5062) Insert + orderby + limit does not need additional RS for limiting rows

2013-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736913#comment-13736913
 ] 

Hive QA commented on HIVE-5062:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597421/HIVE-5062.D12171.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2789 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/402/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/402/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Insert + orderby + limit does not need additional RS for limiting rows
 --

 Key: HIVE-5062
 URL: https://issues.apache.org/jira/browse/HIVE-5062
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-5062.D12171.1.patch


 The query,
 {noformat}
 insert overwrite table dummy select * from src order by key limit 10;
 {noformat}
 runs two MR but single MR is enough.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5063) Fix some non-deterministic or not-updated tests

2013-08-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736922#comment-13736922
 ] 

Brock Noland commented on HIVE-5063:


+1 LGTM 

We'll see what the automated tests say and then I'll run the affected tests on 
hadoop2.

 Fix some non-deterministic or not-updated tests
 ---

 Key: HIVE-5063
 URL: https://issues.apache.org/jira/browse/HIVE-5063
 Project: Hive
  Issue Type: Sub-task
  Components: Tests
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5063.D12177.1.patch


 update result
 auto_join14.q,input12.q,join14.q,union_remove_19.q
 fix non-determinisitcs
 partition_date.q,partition_date2.q,ppd_vc.q,nonblock_op_deduplicate.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4123:


  Resolution: Fixed
Release Note: I just committed this. Thanks, Prasanth!
  Status: Resolved  (was: Patch Available)

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736931#comment-13736931
 ] 

Brock Noland commented on HIVE-4123:


[~owen.omalley] looks like your comment was accidently put in the Release 
Notes section.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5063) Fix some non-deterministic or not-updated tests

2013-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737007#comment-13737007
 ] 

Hive QA commented on HIVE-5063:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597426/HIVE-5063.D12177.1.patch

{color:green}SUCCESS:{color} +1 2789 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/403/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/403/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Fix some non-deterministic or not-updated tests
 ---

 Key: HIVE-5063
 URL: https://issues.apache.org/jira/browse/HIVE-5063
 Project: Hive
  Issue Type: Sub-task
  Components: Tests
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5063.D12177.1.patch


 update result
 auto_join14.q,input12.q,join14.q,union_remove_19.q
 fix non-determinisitcs
 partition_date.q,partition_date2.q,ppd_vc.q,nonblock_op_deduplicate.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5046) Hcatalog's bin/hcat script doesn't respect HIVE_HOME

2013-08-12 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737008#comment-13737008
 ] 

Mark Grover commented on HIVE-5046:
---

Thanks Brock!

 Hcatalog's bin/hcat script doesn't respect HIVE_HOME
 

 Key: HIVE-5046
 URL: https://issues.apache.org/jira/browse/HIVE-5046
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Mark Grover
Assignee: Mark Grover
 Fix For: 0.12.0

 Attachments: HIVE-5046.1.patch


 https://github.com/apache/hive/blob/trunk/hcatalog/bin/hcat#L81
 The quoted snippet (see below) intends to set HIVE_HOME if it's not set (i.e. 
 HIVE_HOME is currently null).
 {code}
 if [ -n ${HIVE_HOME} ]; then
 {code}
 However, {{-n}} checks if the variable is _not_ null. So, the above code ends 
 up setting HIVE_HOME to the default value if it is actually set already, 
 overriding the set value. This condition needs to be negated.
 Moreover, {{-n}} checks requires the string being tested to be enclosed in 
 quotes.
 Reference:
 http://tldp.org/LDP/abs/html/comparison-ops.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5056) MapJoinProcessor ignores order of values in removing RS

2013-08-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737038#comment-13737038
 ] 

Xuefu Zhang commented on HIVE-5056:
---

Could anyone give concise description with enough details for other people to 
understand the bug? Abbreviation sometimes cause confusion too. RS? Sorry if 
this is obvious.

 MapJoinProcessor ignores order of values in removing RS
 ---

 Key: HIVE-5056
 URL: https://issues.apache.org/jira/browse/HIVE-5056
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-5056.D12147.1.patch, HIVE-5056.D12147.2.patch


 http://www.mail-archive.com/user@hive.apache.org/msg09073.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-12 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737060#comment-13737060
 ] 

Prasanth J commented on HIVE-4123:
--

Thanks [~owen.omalley]for committing the patch!

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [jira] [Commented] (HIVE-5056) MapJoinProcessor ignores order of values in removing RS

2013-08-12 Thread pandeeswaran bhoopathy
From the source code, it looks like RS indicates Reduce Sink.

Sent from my iPad

 On 12-Aug-2013, at 10:33 pm, Xuefu Zhang (JIRA) j...@apache.org wrote:
 
 
[ 
 https://issues.apache.org/jira/browse/HIVE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737038#comment-13737038
  ] 
 
 Xuefu Zhang commented on HIVE-5056:
 ---
 
 Could anyone give concise description with enough details for other people to 
 understand the bug? Abbreviation sometimes cause confusion too. RS? Sorry if 
 this is obvious.
 
 MapJoinProcessor ignores order of values in removing RS
 ---
 
Key: HIVE-5056
URL: https://issues.apache.org/jira/browse/HIVE-5056
Project: Hive
 Issue Type: Bug
 Components: Query Processor
   Reporter: Navis
   Assignee: Navis
Attachments: HIVE-5056.D12147.1.patch, HIVE-5056.D12147.2.patch
 
 
 http://www.mail-archive.com/user@hive.apache.org/msg09073.html
 
 --
 This message is automatically generated by JIRA.
 If you think it was sent incorrectly, please contact your JIRA administrators
 For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2

2013-08-12 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737106#comment-13737106
 ] 

Mark Grover commented on HIVE-4388:
---

Brock, thanks for looking into this.

I was reviewing the patch and saw that you have several references to 
{{getFamilyMap()}}.

This method's return type was changed in newer version of HBase. Even though 
HBASE-9142 introduces the original method back in 0.95.2, it's deprecated.

Do you think it makes more sense to use {{getFamilyCellMap()}} here instead?


 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, 
 HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5047) Hive client filters partitions incorrectly via pushdown in certain cases involving or

2013-08-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737109#comment-13737109
 ] 

Sergey Shelukhin commented on HIVE-5047:


[~ashutoshc] Ping?

 Hive client filters partitions incorrectly via pushdown in certain cases 
 involving or
 ---

 Key: HIVE-5047
 URL: https://issues.apache.org/jira/browse/HIVE-5047
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5047.D12141.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well (yet)

2013-08-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737108#comment-13737108
 ] 

Sergey Shelukhin commented on HIVE-5029:


[~ashutoshc] Wdyt? I am running the test to see what's wrong now, could be one 
of the examples of working SQL masking non working JDO, this query was added 
fairly recently

 direct SQL perf optimization cannot be tested well (yet)
 

 Key: HIVE-5029
 URL: https://issues.apache.org/jira/browse/HIVE-5029
 Project: Hive
  Issue Type: Test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-5029.patch


 HIVE-4051 introduced perf optimization that involves getting partitions 
 directly via SQL in metastore. Given that SQL queries might not work on all 
 datastores (and will not work on non-SQL ones), JDO fallback is in place.
 Given that perf improvement is very large for short queries, it's on by 
 default.
 However, there's a problem with tests with regard to that. If SQL code is 
 broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might 
 allow tests to pass.
 We are going to disable SQL by default before the testing problem is resolved.
 There are several possible solultions:
 1) Separate build for this setting. Seems like an overkill...
 2) Enable by default; disable by default in tests, create a clone of 
 TestCliDriver with a subset of queries that will exercise the SQL path.
 3) Have some sort of test hook inside metastore that will run both ORM and 
 SQL and compare.
 3') Or make a subclass of ObjectStore that will do that. ObjectStore is 
 already pluggable.
 4) Write unit tests for one of the modes (JDO, as non-default?) and declare 
 that they are sufficient; disable fallback in tests.
 3' seems like the easiest. For now we will disable SQL by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3926) PPD on virtual column of partitioned table is not working

2013-08-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737112#comment-13737112
 ] 

Sergey Shelukhin commented on HIVE-3926:


It does filter non-partition columns in some cases, in HIVE-5047 there's a 
related problem.


 PPD on virtual column of partitioned table is not working
 -

 Key: HIVE-3926
 URL: https://issues.apache.org/jira/browse/HIVE-3926
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-3926.6.patch, HIVE-3926.D8121.1.patch, 
 HIVE-3926.D8121.2.patch, HIVE-3926.D8121.3.patch, HIVE-3926.D8121.4.patch, 
 HIVE-3926.D8121.5.patch


 {code}
 select * from src where BLOCK__OFFSET__INSIDE__FILE100;
 {code}
 is working, but
 {code}
 select * from srcpart where BLOCK__OFFSET__INSIDE__FILE100;
 {code}
 throws SemanticException. Disabling PPD makes it work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3926) PPD on virtual column of partitioned table is not working

2013-08-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737113#comment-13737113
 ] 

Sergey Shelukhin commented on HIVE-3926:


Let me double check... (I am making changes in HIVE-4985 and HIVE-4914, they 
are not quite ready yet though)

 PPD on virtual column of partitioned table is not working
 -

 Key: HIVE-3926
 URL: https://issues.apache.org/jira/browse/HIVE-3926
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-3926.6.patch, HIVE-3926.D8121.1.patch, 
 HIVE-3926.D8121.2.patch, HIVE-3926.D8121.3.patch, HIVE-3926.D8121.4.patch, 
 HIVE-3926.D8121.5.patch


 {code}
 select * from src where BLOCK__OFFSET__INSIDE__FILE100;
 {code}
 is working, but
 {code}
 select * from srcpart where BLOCK__OFFSET__INSIDE__FILE100;
 {code}
 throws SemanticException. Disabling PPD makes it work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5039) Support autoReconnect at JDBC

2013-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737137#comment-13737137
 ] 

Hive QA commented on HIVE-5039:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597453/HIVE-5039.D12183.1.patch

{color:green}SUCCESS:{color} +1 2849 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/404/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/404/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Support autoReconnect at JDBC 
 --

 Key: HIVE-5039
 URL: https://issues.apache.org/jira/browse/HIVE-5039
 Project: Hive
  Issue Type: New Feature
  Components: JDBC
Affects Versions: 0.11.0
Reporter: Azrael Park
Assignee: Azrael Park
Priority: Trivial
 Attachments: HIVE-5039.D12183.1.patch, HIVE-5039.patch


 If hiveServer2 is shutdown, connection is broken. Let the connection can 
 reconnect automatically after hiveServer2 re-started.
 {noformat}
 jdbc:hive2://localhost:1/default?autoReconnect=true
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5003) Localize hive exec jar for tez

2013-08-12 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5003:
-

Attachment: HIVE-5003.4.patch.txt

Updated to better re-use code.

 Localize hive exec jar for tez
 --

 Key: HIVE-5003
 URL: https://issues.apache.org/jira/browse/HIVE-5003
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, 
 HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HiveLocalizationDesign.txt


 Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and 
 added to vertices and the dag itself as needed. For hive we need to localize 
 the hive-exec.jar.
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez

2013-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737198#comment-13737198
 ] 

Edward Capriolo commented on HIVE-5003:
---

I think we are repeating a semi-disturbing trend of writing a lot of code we 
have little direct coverage for. For example take a method like:

{code}
 private static Path getDefaultDestDir(Configuration conf) throws 
LoginException, IOException {
{code}

or 
{code}
 private static String getExecJarPathLocal () {
{code}

I think we should have direct junit style tests around these methods. The code 
clean (for its development state) and well documented. But I think we have the 
chance to do it better.

Right now, for our current code, and this code. We are totally reliant on our 
end-to-end system to validate every minor change. If we have smaller unit tests 
on things like this we can have more coverage and enhance our ability to make 
changes to the project without having as many worries around side effects that 
will not manifest until final end to end tests. 

I think we should draw a line in the sand and here and attempt to write unit 
tests and design code in a testable way. Not just write it and worry about unit 
tests later. What do you think?



 Localize hive exec jar for tez
 --

 Key: HIVE-5003
 URL: https://issues.apache.org/jira/browse/HIVE-5003
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, 
 HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HiveLocalizationDesign.txt


 Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and 
 added to vertices and the dag itself as needed. For hive we need to localize 
 the hive-exec.jar.
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3189) cast ( string type as bigint) returning null values

2013-08-12 Thread Xiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiu updated HIVE-3189:
--

Attachment: Hive-3189.patch.txt

 cast ( string type as bigint) returning null values
 -

 Key: HIVE-3189
 URL: https://issues.apache.org/jira/browse/HIVE-3189
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: N Campbell
 Attachments: Hive-3189.patch.txt


 select rnum, c1, cast(c1 as bigint) from cert.tsdchar tsdchar where rnum in 
 (0,1,2)
 create table if not exists CERT.TSDCHAR ( RNUM int , C1 string)
 row format sequencefile
 rnum  c1  _c2
 0 -1  null
 1 0   null
 2 10  null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3189) cast ( string type as bigint) returning null values

2013-08-12 Thread Xiu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiu updated HIVE-3189:
--

Status: Patch Available  (was: Open)

It is fixed in Hive 0.9.0+ as well as the current Hive trunk and the defect 
could not be reproduced.

A patch of new testcases is attached to verify the fix.

 cast ( string type as bigint) returning null values
 -

 Key: HIVE-3189
 URL: https://issues.apache.org/jira/browse/HIVE-3189
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: N Campbell
 Attachments: Hive-3189.patch.txt


 select rnum, c1, cast(c1 as bigint) from cert.tsdchar tsdchar where rnum in 
 (0,1,2)
 create table if not exists CERT.TSDCHAR ( RNUM int , C1 string)
 row format sequencefile
 rnum  c1  _c2
 0 -1  null
 1 0   null
 2 10  null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5009) Fix minor optimization issues

2013-08-12 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5009:
---

Attachment: (was: AbstractBucketJoinProc.java)

 Fix minor optimization issues
 -

 Key: HIVE-5009
 URL: https://issues.apache.org/jira/browse/HIVE-5009
 Project: Hive
  Issue Type: Improvement
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

   Original Estimate: 48h
  Remaining Estimate: 48h

 I have found some minor optimization issues in the codebase, which I would 
 like to rectify and contribute. Specifically, these are:
 The optimizations that could be applied to Hive's code base are as follows:
 1. Use StringBuffer when appending strings - In 184 instances, the 
 concatination operator (+=) was used when appending strings. This is 
 inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
 should be used. 12 instances of this optimization can be applied to the 
 GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
 uses the + operator inside a loop, so does the column projection utilities 
 class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
 Tests showed that using the StringBuilder when appending strings is 57\% 
 faster than using the + operator (using the StringBuffer took 122 
 milliseconds whilst the + operator took 284 milliseconds). The reason as to 
 why using the StringBuffer class is preferred over using the + operator, is 
 because
 String third = first + second;
 gets compiled to:
 StringBuilder builder = new StringBuilder( first );
 builder.append( second );
 third = builder.toString();
 Therefore, when building complex strings, that, for example involve loops, 
 require many instantiations (and as discussed below, creating new objects 
 inside loops is inefficient).
 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
 is a more efficient at creating  creating lists from arrays than using loops 
 to manually iterate over the elements (using asList is computationally very 
 cheap, O(1), as it merely creates a wrapper object around the array; looping 
 through the list however has a complexity of O(n) since a new list is created 
 and every element in the array is added to this new list). As confirmed by 
 the experiment detailed in Appendix D, the Java compiler does not 
 automatically optimize and replace tight-loop copying with asList: the 
 loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
 instant. 
 Four instances of this optimization can be applied to Hive's codebase (two of 
 these should be applied to the Map-Join container - MapJoinRowContainer) - 
 lines 92 to 98:
  for (obj = other.first(); obj != null; obj = other.next()) {
   ArrayListObject ele = new ArrayList(obj.length);
   for (int i = 0; i  obj.length; i++) {
 ele.add(obj[i]);
   }
   list.add((Row) ele);
 }
 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
 could be avoided by simply using the provided static conversion methods. As 
 noted in the PMD documentation, using these avoids the cost of creating 
 objects that also need to be garbage-collected later.
 For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
 more efficient parseDouble method call:
 // Inefficient:
 Double percent = Double.valueOf(value).doubleValue();
 // To be replaced by:
 Double percent = Double.parseDouble(value);
 Our test case in Appendix D confirms this: converting 10,000 strings into 
 integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
 unnecessary wrapper object) took 119 on average; using parseInt() took only 
 38. Therefore creating even just one unnecessary wrapper object can make your 
 code up to 68% slower.
 4. Converting literals to strings using +  - Converting literals to strings 
 using +  is quite inefficient (see Appendix D) and should be done by 
 calling the toString() method instead: converting 1,000,000 integers to 
 strings using +  took, on average, 1340 milliseconds whilst using the 
 toString() method only required 1183 milliseconds (hence adding empty strings 
 takes nearly 12% more time). 
 89 instances of this using +  when converting literals were found in Hive's 
 codebase - one of these are found in the JoinUtil.
 5. Avoid manual copying of arrays - Instead of copying arrays as is done in 
 GroupByOperator on line 1040 (see below), the more efficient System.arraycopy 
 can be used (arraycopy is a native method meaning that the entire memory 
 block is copied using memcpy or mmove).
 // Line 1040 of the GroupByOperator
 for (int i = 0; i  keys.length; i++) {
   forwardCache[i] = keys[i];
 }   
 Using 

[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez

2013-08-12 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737221#comment-13737221
 ] 

Gunther Hagleitner commented on HIVE-5003:
--

[~appodictic] I agree with you and I don't mind the tez integration being a 
test bed for this. I'll open a tez blocker jira for this - the reason being 
that we need to put some serious thought into this first. 

For instance:
- private methods should not have unit tests, but public should I think 
(private should be covered implicitly, I don't want to make everything package 
private for testing and I don't want reflection to call those)
- Mocking: We have to investigate how we do this cleanly
- Dependency injection: We might have to redesign parts of the code allow 
proper testing easily




 Localize hive exec jar for tez
 --

 Key: HIVE-5003
 URL: https://issues.apache.org/jira/browse/HIVE-5003
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, 
 HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HiveLocalizationDesign.txt


 Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and 
 added to vertices and the dag itself as needed. For hive we need to localize 
 the hive-exec.jar.
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1511) Hive plan serialization is slow

2013-08-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1511:
---

Attachment: HIVE-1511-wip3.patch

More progress. Some testcases still failing.

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Ning Zhang
 Attachments: HIVE-1511.patch, HIVE-1511-wip2.patch, 
 HIVE-1511-wip3.patch, HIVE-1511-wip.patch


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-12 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Description: Object instantiation inside loops is very expensive. Where 
possible, object references should be created outside the loop so that they can 
be reused.  (was: java/org/apache/hadoop/hive/ql/Context.java
java/org/apache/hadoop/hive/ql/Driver.java
java/org/apache/hadoop/hive/ql/QueryPlan.java
java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
java/org/apache/hadoop/hive/ql/exec/MapOperator.java
java/org/apache/hadoop/hive/ql/exec/MoveTask.java
java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
java/org/apache/hadoop/hive/ql/exec/StatsTask.java
java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
java/org/apache/hadoop/hive/ql/exec/Utilities.java
java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
java/org/apache/hadoop/hive/ql/history/HiveHistory.java
java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
java/org/apache/hadoop/hive/ql/io/RCFile.java
java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java
java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateMapper.java
java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java
java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
java/org/apache/hadoop/hive/ql/metadata/Hive.java
java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java
java/org/apache/hadoop/hive/ql/metadata/formatting/JsonMetaDataFormatter.java
java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java
java/org/apache/hadoop/hive/ql/optimizer/AbstractBucketJoinProc.java

[jira] [Updated] (HIVE-5054) Remove unused property submitviachild

2013-08-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5054:
---

Attachment: HIVE-5054.patch

Previous patch has some unwanted changes. Uploading the correct patch.

 Remove unused property submitviachild
 -

 Key: HIVE-5054
 URL: https://issues.apache.org/jira/browse/HIVE-5054
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-5054.patch, HIVE-5054.patch


 This property only exist in HiveConf and is always set to false. Lets get rid 
 of dead code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5054) Remove unused property submitviachild

2013-08-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5054:
---

Status: Open  (was: Patch Available)

Uploaded incorrect patch.

 Remove unused property submitviachild
 -

 Key: HIVE-5054
 URL: https://issues.apache.org/jira/browse/HIVE-5054
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-5054.patch, HIVE-5054.patch


 This property only exist in HiveConf and is always set to false. Lets get rid 
 of dead code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5009) Fix minor optimization issues

2013-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737225#comment-13737225
 ] 

Edward Capriolo commented on HIVE-5009:
---

Our build process is slow. Technically do not need clean 'every' time mostly 
you only need it when changing the hadoop version or updating one of the libs. 
However the build is still 'slow' regardless of running clean first. Its just 
something we have to deal with for a bit until we re factor everything.

 Fix minor optimization issues
 -

 Key: HIVE-5009
 URL: https://issues.apache.org/jira/browse/HIVE-5009
 Project: Hive
  Issue Type: Improvement
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

   Original Estimate: 48h
  Remaining Estimate: 48h

 I have found some minor optimization issues in the codebase, which I would 
 like to rectify and contribute. Specifically, these are:
 The optimizations that could be applied to Hive's code base are as follows:
 1. Use StringBuffer when appending strings - In 184 instances, the 
 concatination operator (+=) was used when appending strings. This is 
 inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
 should be used. 12 instances of this optimization can be applied to the 
 GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
 uses the + operator inside a loop, so does the column projection utilities 
 class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
 Tests showed that using the StringBuilder when appending strings is 57\% 
 faster than using the + operator (using the StringBuffer took 122 
 milliseconds whilst the + operator took 284 milliseconds). The reason as to 
 why using the StringBuffer class is preferred over using the + operator, is 
 because
 String third = first + second;
 gets compiled to:
 StringBuilder builder = new StringBuilder( first );
 builder.append( second );
 third = builder.toString();
 Therefore, when building complex strings, that, for example involve loops, 
 require many instantiations (and as discussed below, creating new objects 
 inside loops is inefficient).
 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
 is a more efficient at creating  creating lists from arrays than using loops 
 to manually iterate over the elements (using asList is computationally very 
 cheap, O(1), as it merely creates a wrapper object around the array; looping 
 through the list however has a complexity of O(n) since a new list is created 
 and every element in the array is added to this new list). As confirmed by 
 the experiment detailed in Appendix D, the Java compiler does not 
 automatically optimize and replace tight-loop copying with asList: the 
 loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
 instant. 
 Four instances of this optimization can be applied to Hive's codebase (two of 
 these should be applied to the Map-Join container - MapJoinRowContainer) - 
 lines 92 to 98:
  for (obj = other.first(); obj != null; obj = other.next()) {
   ArrayListObject ele = new ArrayList(obj.length);
   for (int i = 0; i  obj.length; i++) {
 ele.add(obj[i]);
   }
   list.add((Row) ele);
 }
 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
 could be avoided by simply using the provided static conversion methods. As 
 noted in the PMD documentation, using these avoids the cost of creating 
 objects that also need to be garbage-collected later.
 For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
 more efficient parseDouble method call:
 // Inefficient:
 Double percent = Double.valueOf(value).doubleValue();
 // To be replaced by:
 Double percent = Double.parseDouble(value);
 Our test case in Appendix D confirms this: converting 10,000 strings into 
 integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
 unnecessary wrapper object) took 119 on average; using parseInt() took only 
 38. Therefore creating even just one unnecessary wrapper object can make your 
 code up to 68% slower.
 4. Converting literals to strings using +  - Converting literals to strings 
 using +  is quite inefficient (see Appendix D) and should be done by 
 calling the toString() method instead: converting 1,000,000 integers to 
 strings using +  took, on average, 1340 milliseconds whilst using the 
 toString() method only required 1183 milliseconds (hence adding empty strings 
 takes nearly 12% more time). 
 89 instances of this using +  when converting literals were found in Hive's 
 codebase - one of these are found in the JoinUtil.
 5. Avoid manual copying of arrays - Instead of copying arrays as is done in 
 

[jira] [Commented] (HIVE-5009) Fix minor optimization issues

2013-08-12 Thread Benjamin Jakobus (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737229#comment-13737229
 ] 

Benjamin Jakobus commented on HIVE-5009:


OK, thanks.

 Fix minor optimization issues
 -

 Key: HIVE-5009
 URL: https://issues.apache.org/jira/browse/HIVE-5009
 Project: Hive
  Issue Type: Improvement
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

   Original Estimate: 48h
  Remaining Estimate: 48h

 I have found some minor optimization issues in the codebase, which I would 
 like to rectify and contribute. Specifically, these are:
 The optimizations that could be applied to Hive's code base are as follows:
 1. Use StringBuffer when appending strings - In 184 instances, the 
 concatination operator (+=) was used when appending strings. This is 
 inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
 should be used. 12 instances of this optimization can be applied to the 
 GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
 uses the + operator inside a loop, so does the column projection utilities 
 class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
 Tests showed that using the StringBuilder when appending strings is 57\% 
 faster than using the + operator (using the StringBuffer took 122 
 milliseconds whilst the + operator took 284 milliseconds). The reason as to 
 why using the StringBuffer class is preferred over using the + operator, is 
 because
 String third = first + second;
 gets compiled to:
 StringBuilder builder = new StringBuilder( first );
 builder.append( second );
 third = builder.toString();
 Therefore, when building complex strings, that, for example involve loops, 
 require many instantiations (and as discussed below, creating new objects 
 inside loops is inefficient).
 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
 is a more efficient at creating  creating lists from arrays than using loops 
 to manually iterate over the elements (using asList is computationally very 
 cheap, O(1), as it merely creates a wrapper object around the array; looping 
 through the list however has a complexity of O(n) since a new list is created 
 and every element in the array is added to this new list). As confirmed by 
 the experiment detailed in Appendix D, the Java compiler does not 
 automatically optimize and replace tight-loop copying with asList: the 
 loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
 instant. 
 Four instances of this optimization can be applied to Hive's codebase (two of 
 these should be applied to the Map-Join container - MapJoinRowContainer) - 
 lines 92 to 98:
  for (obj = other.first(); obj != null; obj = other.next()) {
   ArrayListObject ele = new ArrayList(obj.length);
   for (int i = 0; i  obj.length; i++) {
 ele.add(obj[i]);
   }
   list.add((Row) ele);
 }
 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
 could be avoided by simply using the provided static conversion methods. As 
 noted in the PMD documentation, using these avoids the cost of creating 
 objects that also need to be garbage-collected later.
 For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
 more efficient parseDouble method call:
 // Inefficient:
 Double percent = Double.valueOf(value).doubleValue();
 // To be replaced by:
 Double percent = Double.parseDouble(value);
 Our test case in Appendix D confirms this: converting 10,000 strings into 
 integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
 unnecessary wrapper object) took 119 on average; using parseInt() took only 
 38. Therefore creating even just one unnecessary wrapper object can make your 
 code up to 68% slower.
 4. Converting literals to strings using +  - Converting literals to strings 
 using +  is quite inefficient (see Appendix D) and should be done by 
 calling the toString() method instead: converting 1,000,000 integers to 
 strings using +  took, on average, 1340 milliseconds whilst using the 
 toString() method only required 1183 milliseconds (hence adding empty strings 
 takes nearly 12% more time). 
 89 instances of this using +  when converting literals were found in Hive's 
 codebase - one of these are found in the JoinUtil.
 5. Avoid manual copying of arrays - Instead of copying arrays as is done in 
 GroupByOperator on line 1040 (see below), the more efficient System.arraycopy 
 can be used (arraycopy is a native method meaning that the entire memory 
 block is copied using memcpy or mmove).
 // Line 1040 of the GroupByOperator
 for (int i = 0; i  keys.length; i++) {
   forwardCache[i] = keys[i];
 }  

[jira] [Created] (HIVE-5064) TestParse fails on JDK7

2013-08-12 Thread Brock Noland (JIRA)
Brock Noland created HIVE-5064:
--

 Summary: TestParse fails on JDK7
 Key: HIVE-5064
 URL: https://issues.apache.org/jira/browse/HIVE-5064
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Brock Noland
Assignee: Brock Noland


TestParse fails on JDK 7 because of the order of XML attributes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez

2013-08-12 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737235#comment-13737235
 ] 

Gunther Hagleitner commented on HIVE-5003:
--

[~vikram.dixit] Some review comments (RB would be great):

- CANNOT_FIND_EXEC_JAR isn't used/ please remove
- hive.jar.directory should be /user/hive/...
- DagUtils class level javadoc comment is already there (seems you're adding a 
second one)
- e.printStackTrace is not good as a debugging method. Either log or pass it on 
to the caller
- // java magic isn't a great comment, be good to say what the magic is 
achieving
- getResourceVersion comment doesn't match the logic. it will return the 
basename not the version string
- There's a system.out.println in the code


 Localize hive exec jar for tez
 --

 Key: HIVE-5003
 URL: https://issues.apache.org/jira/browse/HIVE-5003
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, 
 HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HiveLocalizationDesign.txt


 Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and 
 added to vertices and the dag itself as needed. For hive we need to localize 
 the hive-exec.jar.
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5065) Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask

2013-08-12 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5065:
-

Priority: Blocker  (was: Major)

 Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask
 

 Key: HIVE-5065
 URL: https://issues.apache.org/jira/browse/HIVE-5065
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Priority: Blocker
 Fix For: tez-branch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez

2013-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737245#comment-13737245
 ] 

Edward Capriolo commented on HIVE-5003:
---

I understand what you are saying. I am ok with the package private idea and 
dependency injection, I generally prefer that to a heavy solution like mocking. 

I would not call this a blocker, but I think we need to design more with 
testing in mind. Lets talk it over elsewhere. 

 Localize hive exec jar for tez
 --

 Key: HIVE-5003
 URL: https://issues.apache.org/jira/browse/HIVE-5003
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, 
 HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HiveLocalizationDesign.txt


 Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and 
 added to vertices and the dag itself as needed. For hive we need to localize 
 the hive-exec.jar.
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5065) Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask

2013-08-12 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-5065:


 Summary: Create proper (i.e.: non .q file based) junit tests for 
DagUtils and TezTask
 Key: HIVE-5065
 URL: https://issues.apache.org/jira/browse/HIVE-5065
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
 Fix For: tez-branch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5064) TestParse fails on JDK7

2013-08-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737253#comment-13737253
 ] 

Xuefu Zhang commented on HIVE-5064:
---

It seems multiple tickets are raised to address the same issue. HIVE-1551 and 
HIVE-4885 were talking about different serialization mechanisms.

 TestParse fails on JDK7
 ---

 Key: HIVE-5064
 URL: https://issues.apache.org/jira/browse/HIVE-5064
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Brock Noland
Assignee: Brock Noland

 TestParse fails on JDK 7 because of the order of XML attributes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5064) TestParse fails on JDK7

2013-08-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737237#comment-13737237
 ] 

Brock Noland commented on HIVE-5064:


I propose before doing the diff we turn the xml into [Canoncial 
XML|http://en.wikipedia.org/wiki/Canonical_XML]. For example:


{noformat}
[brock@bigboy ~]$ cat test.xml 
root
attr z=value k = value a=

value /
/root
[brock@bigboy ~]$ 
[brock@bigboy ~]$ xmllint test.xml 
?xml version=1.0?
root
attr z=value k=value a=value/
/root
[brock@bigboy ~]$ xmllint --c14n test.xml ; echo
root
attr a=value k=value z=value/attr
/root
{noformat}

 TestParse fails on JDK7
 ---

 Key: HIVE-5064
 URL: https://issues.apache.org/jira/browse/HIVE-5064
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Brock Noland
Assignee: Brock Noland

 TestParse fails on JDK 7 because of the order of XML attributes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5064) TestParse fails on JDK7

2013-08-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737258#comment-13737258
 ] 

Brock Noland commented on HIVE-5064:


Thanks [~xuefuz], I had forgotten about HIVE-4885. I'll mark this a dup.

 TestParse fails on JDK7
 ---

 Key: HIVE-5064
 URL: https://issues.apache.org/jira/browse/HIVE-5064
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Brock Noland
Assignee: Brock Noland

 TestParse fails on JDK 7 because of the order of XML attributes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4885) Alternative object serialization for execution plan in hive testing

2013-08-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737257#comment-13737257
 ] 

Brock Noland commented on HIVE-4885:


Hey guys, since this patch fixes the tests on JDK7 how about we commit and open 
a follow-on JIRA about a different way of doing the serialization?

 Alternative object serialization for execution plan in hive testing 
 

 Key: HIVE-4885
 URL: https://issues.apache.org/jira/browse/HIVE-4885
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.10.0, 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-4885.patch


 Currently there are a lot of test cases involving in comparing execution 
 plan, such as those in TestParse suite. XmlEncoder is used to serialize the 
 generated plan by hive, and store it in the file for file diff comparison. 
 However, XmlEncoder is tied with Java compiler, whose implementation may 
 change from version to version. Thus, upgrade the compiler can generate a lot 
 of fake test failures. The following is an example of diff generated when 
 running hive with JDK7:
 {code}
 Begin query: case_sensitivity.q
 diff -a 
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.out
  
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/parse/case_sensitivity.q.out
 diff -a -b 
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.xml
  
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/plan/case_sensitivity.q.xml
 3c3
   object class=org.apache.hadoop.hive.ql.exec.MapRedTask id=MapRedTask0
 ---
   object id=MapRedTask0 
  class=org.apache.hadoop.hive.ql.exec.MapRedTask 
 12c12
 object class=java.util.ArrayList id=ArrayList0
 ---
 object id=ArrayList0 class=java.util.ArrayList 
 14c14
   object class=org.apache.hadoop.hive.ql.exec.MoveTask 
 id=MoveTask0
 ---
   object id=MoveTask0 
  class=org.apache.hadoop.hive.ql.exec.MoveTask 
 18c18
   object class=org.apache.hadoop.hive.ql.exec.MoveTask 
 id=MoveTask1
 ---
   object id=MoveTask1 
  class=org.apache.hadoop.hive.ql.exec.MoveTask 
 22c22
   object class=org.apache.hadoop.hive.ql.exec.StatsTask 
 id=StatsTask0
 ---
   object id=StatsTask0 
  class=org.apache.hadoop.hive.ql.exec.StatsTask 
 60c60
   object class=org.apache.hadoop.hive.ql.exec.MapRedTask 
 id=MapRedTask1
 ---
   object id=MapRedTask1 
  class=org.apache.hadoop.hive.ql.exec.MapRedTask 
 {code}
 As it can be seen, the only difference is the order of the attributes in the 
 serialized XML doc, yet it brings 50+ test failures in Hive.
 We need to have a better plan comparison, or object serialization to improve 
 the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4899) Hive returns non-meanful error message for ill-formed fs.default.name

2013-08-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4899:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I committed this to trunk. Thanks Xuefu for the patch and Ashutosh for the 
review!

 Hive returns non-meanful error message for ill-formed fs.default.name
 -

 Key: HIVE-4899
 URL: https://issues.apache.org/jira/browse/HIVE-4899
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.10.0, 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4899.patch


 For query in test case fs_default_name1.q:
 {code}
 set fs.default.name='http://www.example.com;
 show tables;
 {code}
 The following error message is returned:
 {code}
 FAILED: IllegalArgumentException null
 {code}
 The message is not very meaningful, and has null in it.
 It would be better if we can provide detailed error message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3630) udf_substr.q fails when using JDK7

2013-08-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737266#comment-13737266
 ] 

Brock Noland commented on HIVE-3630:


Xuefu, I don't follow your last comment. It seems this test is now passing on 
JDK7 and this JIRA can be resolved, is that what you are saying?

 udf_substr.q fails when using JDK7
 --

 Key: HIVE-3630
 URL: https://issues.apache.org/jira/browse/HIVE-3630
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.9.1, 0.10.0, 0.11.0
Reporter: Chris Drome
Assignee: Chris Drome
 Attachments: HIVE-3630-0.10.patch, HIVE-3630-0.9.patch, 
 HIVE-3630-trunk.patch


 Internal error: Cannot find ConstantObjectInspector for BINARY
 This exception has two causes.
 JDK7 iterators do not return values in the same order as JDK6, which selects 
 a different implementation of this UDF when the first argument is null. With 
 JDK7 this happens to be the binary version.
 The binary version is not implemented properly which ultimately causes the 
 exception when the method is called.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-5064) TestParse fails on JDK7

2013-08-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved HIVE-5064.


Resolution: Duplicate

 TestParse fails on JDK7
 ---

 Key: HIVE-5064
 URL: https://issues.apache.org/jira/browse/HIVE-5064
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Brock Noland
Assignee: Brock Noland

 TestParse fails on JDK 7 because of the order of XML attributes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737282#comment-13737282
 ] 

Hudson commented on HIVE-4123:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #124 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/124/])
HIVE-4123 Improved ORC integer RLE version 2. (Prasanth Jayachandran via 
omalley) (omalley: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513155)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* 
/hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerReader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerWriter.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.orig
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitPack.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestIntegerCompressionReader.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
* /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
* /hive/trunk/ql/src/test/resources/orc-file-dump.out


 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4579) Create a SARG interface for RecordReaders

2013-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737283#comment-13737283
 ] 

Hudson commented on HIVE-4579:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #124 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/124/])
HIVE-4579: Create a SARG interface for RecordReaders (Owen O'Malley via Gunther 
Hagleitner) (gunther: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513029)
* /hive/trunk/ivy/libraries.properties
* /hive/trunk/ql/ivy.xml
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/sarg
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/sarg/PredicateLeaf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/sarg
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestSearchArgumentImpl.java


 Create a SARG interface for RecordReaders
 -

 Key: HIVE-4579
 URL: https://issues.apache.org/jira/browse/HIVE-4579
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.12.0

 Attachments: h-4579.patch, HIVE-4579.4.patch, 
 HIVE-4579.D11409.1.patch, HIVE-4579.D11409.2.patch, HIVE-4579.D11409.3.patch, 
 pushdown.pdf


 I think we should create a SARG (http://en.wikipedia.org/wiki/Sargable) 
 interface for RecordReaders. For a first pass, I'll create an API that uses 
 the value stored in hive.io.filter.expr.serialized.
 The desire is to define an simpler interface that the direct AST expression 
 that is provided by hive.io.filter.expr.serialized so that the code to 
 evaluate expressions can be generalized instead of put inside a particular 
 RecordReader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type

2013-08-12 Thread Mark Wagner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737287#comment-13737287
 ] 

Mark Wagner commented on HIVE-2482:
---

bq. I think we should not do this ^ lets make another UDF, or overload the 
parameters of this one.

Is there any way to deprecate a UDF that will move people away from the current 
'unhex'? The only difference from the updated version is that the current one 
wraps the output in Text, so that it could be used by Hive before the binary 
support. Now that there is binary support it doesn't make any sense for unhex 
to wrap its output.

 Convenience UDFs for binary data type
 -

 Key: HIVE-2482
 URL: https://issues.apache.org/jira/browse/HIVE-2482
 Project: Hive
  Issue Type: New Feature
Reporter: Ashutosh Chauhan
Assignee: Mark Wagner
 Fix For: 0.12.0

 Attachments: HIVE-2482.1.patch, HIVE-2482.2.patch, HIVE-2482.3.patch, 
 HIVE-2482.4.patch


 HIVE-2380 introduced binary data type in Hive. It will be good to have 
 following udfs to make it more useful:
 * UDF's to convert to/from hex string
 * UDF's to convert to/from string using a specific encoding
 * UDF's to convert to/from base64 string

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well (yet)

2013-08-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737294#comment-13737294
 ] 

Sergey Shelukhin commented on HIVE-5029:


The test passes on my test machine on recent trunk

 direct SQL perf optimization cannot be tested well (yet)
 

 Key: HIVE-5029
 URL: https://issues.apache.org/jira/browse/HIVE-5029
 Project: Hive
  Issue Type: Test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-5029.patch


 HIVE-4051 introduced perf optimization that involves getting partitions 
 directly via SQL in metastore. Given that SQL queries might not work on all 
 datastores (and will not work on non-SQL ones), JDO fallback is in place.
 Given that perf improvement is very large for short queries, it's on by 
 default.
 However, there's a problem with tests with regard to that. If SQL code is 
 broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might 
 allow tests to pass.
 We are going to disable SQL by default before the testing problem is resolved.
 There are several possible solultions:
 1) Separate build for this setting. Seems like an overkill...
 2) Enable by default; disable by default in tests, create a clone of 
 TestCliDriver with a subset of queries that will exercise the SQL path.
 3) Have some sort of test hook inside metastore that will run both ORM and 
 SQL and compare.
 3') Or make a subclass of ObjectStore that will do that. ObjectStore is 
 already pluggable.
 4) Write unit tests for one of the modes (JDO, as non-default?) and declare 
 that they are sufficient; disable fallback in tests.
 3' seems like the easiest. For now we will disable SQL by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type

2013-08-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737295#comment-13737295
 ] 

Ashutosh Chauhan commented on HIVE-2482:


I agree with [~mwagner] analysis.

 Convenience UDFs for binary data type
 -

 Key: HIVE-2482
 URL: https://issues.apache.org/jira/browse/HIVE-2482
 Project: Hive
  Issue Type: New Feature
Reporter: Ashutosh Chauhan
Assignee: Mark Wagner
 Fix For: 0.12.0

 Attachments: HIVE-2482.1.patch, HIVE-2482.2.patch, HIVE-2482.3.patch, 
 HIVE-2482.4.patch


 HIVE-2380 introduced binary data type in Hive. It will be good to have 
 following udfs to make it more useful:
 * UDF's to convert to/from hex string
 * UDF's to convert to/from string using a specific encoding
 * UDF's to convert to/from base64 string

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4885) Alternative object serialization for execution plan in hive testing

2013-08-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737297#comment-13737297
 ] 

Ashutosh Chauhan commented on HIVE-4885:


I am fine with moving forward on this one. Don't know if [~appodictic] has some 
concerns or other suggestions for this issue.

 Alternative object serialization for execution plan in hive testing 
 

 Key: HIVE-4885
 URL: https://issues.apache.org/jira/browse/HIVE-4885
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.10.0, 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-4885.patch


 Currently there are a lot of test cases involving in comparing execution 
 plan, such as those in TestParse suite. XmlEncoder is used to serialize the 
 generated plan by hive, and store it in the file for file diff comparison. 
 However, XmlEncoder is tied with Java compiler, whose implementation may 
 change from version to version. Thus, upgrade the compiler can generate a lot 
 of fake test failures. The following is an example of diff generated when 
 running hive with JDK7:
 {code}
 Begin query: case_sensitivity.q
 diff -a 
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.out
  
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/parse/case_sensitivity.q.out
 diff -a -b 
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.xml
  
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/plan/case_sensitivity.q.xml
 3c3
   object class=org.apache.hadoop.hive.ql.exec.MapRedTask id=MapRedTask0
 ---
   object id=MapRedTask0 
  class=org.apache.hadoop.hive.ql.exec.MapRedTask 
 12c12
 object class=java.util.ArrayList id=ArrayList0
 ---
 object id=ArrayList0 class=java.util.ArrayList 
 14c14
   object class=org.apache.hadoop.hive.ql.exec.MoveTask 
 id=MoveTask0
 ---
   object id=MoveTask0 
  class=org.apache.hadoop.hive.ql.exec.MoveTask 
 18c18
   object class=org.apache.hadoop.hive.ql.exec.MoveTask 
 id=MoveTask1
 ---
   object id=MoveTask1 
  class=org.apache.hadoop.hive.ql.exec.MoveTask 
 22c22
   object class=org.apache.hadoop.hive.ql.exec.StatsTask 
 id=StatsTask0
 ---
   object id=StatsTask0 
  class=org.apache.hadoop.hive.ql.exec.StatsTask 
 60c60
   object class=org.apache.hadoop.hive.ql.exec.MapRedTask 
 id=MapRedTask1
 ---
   object id=MapRedTask1 
  class=org.apache.hadoop.hive.ql.exec.MapRedTask 
 {code}
 As it can be seen, the only difference is the order of the attributes in the 
 serialized XML doc, yet it brings 50+ test failures in Hive.
 We need to have a better plan comparison, or object serialization to improve 
 the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4885) Alternative object serialization for execution plan in hive testing

2013-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737307#comment-13737307
 ] 

Edward Capriolo commented on HIVE-4885:
---

+1 move forward.

 Alternative object serialization for execution plan in hive testing 
 

 Key: HIVE-4885
 URL: https://issues.apache.org/jira/browse/HIVE-4885
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.10.0, 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-4885.patch


 Currently there are a lot of test cases involving in comparing execution 
 plan, such as those in TestParse suite. XmlEncoder is used to serialize the 
 generated plan by hive, and store it in the file for file diff comparison. 
 However, XmlEncoder is tied with Java compiler, whose implementation may 
 change from version to version. Thus, upgrade the compiler can generate a lot 
 of fake test failures. The following is an example of diff generated when 
 running hive with JDK7:
 {code}
 Begin query: case_sensitivity.q
 diff -a 
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.out
  
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/parse/case_sensitivity.q.out
 diff -a -b 
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.xml
  
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/plan/case_sensitivity.q.xml
 3c3
   object class=org.apache.hadoop.hive.ql.exec.MapRedTask id=MapRedTask0
 ---
   object id=MapRedTask0 
  class=org.apache.hadoop.hive.ql.exec.MapRedTask 
 12c12
 object class=java.util.ArrayList id=ArrayList0
 ---
 object id=ArrayList0 class=java.util.ArrayList 
 14c14
   object class=org.apache.hadoop.hive.ql.exec.MoveTask 
 id=MoveTask0
 ---
   object id=MoveTask0 
  class=org.apache.hadoop.hive.ql.exec.MoveTask 
 18c18
   object class=org.apache.hadoop.hive.ql.exec.MoveTask 
 id=MoveTask1
 ---
   object id=MoveTask1 
  class=org.apache.hadoop.hive.ql.exec.MoveTask 
 22c22
   object class=org.apache.hadoop.hive.ql.exec.StatsTask 
 id=StatsTask0
 ---
   object id=StatsTask0 
  class=org.apache.hadoop.hive.ql.exec.StatsTask 
 60c60
   object class=org.apache.hadoop.hive.ql.exec.MapRedTask 
 id=MapRedTask1
 ---
   object id=MapRedTask1 
  class=org.apache.hadoop.hive.ql.exec.MapRedTask 
 {code}
 As it can be seen, the only difference is the order of the attributes in the 
 serialized XML doc, yet it brings 50+ test failures in Hive.
 We need to have a better plan comparison, or object serialization to improve 
 the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4885) Alternative object serialization for execution plan in hive testing

2013-08-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737311#comment-13737311
 ] 

Brock Noland commented on HIVE-4885:


Sounds good. I am +1 on the patch as well.

 Alternative object serialization for execution plan in hive testing 
 

 Key: HIVE-4885
 URL: https://issues.apache.org/jira/browse/HIVE-4885
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.10.0, 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.12.0

 Attachments: HIVE-4885.patch


 Currently there are a lot of test cases involving in comparing execution 
 plan, such as those in TestParse suite. XmlEncoder is used to serialize the 
 generated plan by hive, and store it in the file for file diff comparison. 
 However, XmlEncoder is tied with Java compiler, whose implementation may 
 change from version to version. Thus, upgrade the compiler can generate a lot 
 of fake test failures. The following is an example of diff generated when 
 running hive with JDK7:
 {code}
 Begin query: case_sensitivity.q
 diff -a 
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.out
  
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/parse/case_sensitivity.q.out
 diff -a -b 
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.xml
  
 /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/plan/case_sensitivity.q.xml
 3c3
   object class=org.apache.hadoop.hive.ql.exec.MapRedTask id=MapRedTask0
 ---
   object id=MapRedTask0 
  class=org.apache.hadoop.hive.ql.exec.MapRedTask 
 12c12
 object class=java.util.ArrayList id=ArrayList0
 ---
 object id=ArrayList0 class=java.util.ArrayList 
 14c14
   object class=org.apache.hadoop.hive.ql.exec.MoveTask 
 id=MoveTask0
 ---
   object id=MoveTask0 
  class=org.apache.hadoop.hive.ql.exec.MoveTask 
 18c18
   object class=org.apache.hadoop.hive.ql.exec.MoveTask 
 id=MoveTask1
 ---
   object id=MoveTask1 
  class=org.apache.hadoop.hive.ql.exec.MoveTask 
 22c22
   object class=org.apache.hadoop.hive.ql.exec.StatsTask 
 id=StatsTask0
 ---
   object id=StatsTask0 
  class=org.apache.hadoop.hive.ql.exec.StatsTask 
 60c60
   object class=org.apache.hadoop.hive.ql.exec.MapRedTask 
 id=MapRedTask1
 ---
   object id=MapRedTask1 
  class=org.apache.hadoop.hive.ql.exec.MapRedTask 
 {code}
 As it can be seen, the only difference is the order of the attributes in the 
 serialized XML doc, yet it brings 50+ test failures in Hive.
 We need to have a better plan comparison, or object serialization to improve 
 the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HIVE-2482) Convenience UDFs for binary data type

2013-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737319#comment-13737319
 ] 

Edward Capriolo edited comment on HIVE-2482 at 8/12/13 8:45 PM:


I am ok with it as well, but remember everything you change breaks someones 
workflow. 

  was (Author: appodictic):
I am ok with it as well, but temember everything you change breaks someones 
workflow. 
  
 Convenience UDFs for binary data type
 -

 Key: HIVE-2482
 URL: https://issues.apache.org/jira/browse/HIVE-2482
 Project: Hive
  Issue Type: New Feature
Reporter: Ashutosh Chauhan
Assignee: Mark Wagner
 Fix For: 0.12.0

 Attachments: HIVE-2482.1.patch, HIVE-2482.2.patch, HIVE-2482.3.patch, 
 HIVE-2482.4.patch


 HIVE-2380 introduced binary data type in Hive. It will be good to have 
 following udfs to make it more useful:
 * UDF's to convert to/from hex string
 * UDF's to convert to/from string using a specific encoding
 * UDF's to convert to/from base64 string

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type

2013-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737319#comment-13737319
 ] 

Edward Capriolo commented on HIVE-2482:
---

I am ok with it as well, but temember everything you change breaks someones 
workflow. 

 Convenience UDFs for binary data type
 -

 Key: HIVE-2482
 URL: https://issues.apache.org/jira/browse/HIVE-2482
 Project: Hive
  Issue Type: New Feature
Reporter: Ashutosh Chauhan
Assignee: Mark Wagner
 Fix For: 0.12.0

 Attachments: HIVE-2482.1.patch, HIVE-2482.2.patch, HIVE-2482.3.patch, 
 HIVE-2482.4.patch


 HIVE-2380 introduced binary data type in Hive. It will be good to have 
 following udfs to make it more useful:
 * UDF's to convert to/from hex string
 * UDF's to convert to/from string using a specific encoding
 * UDF's to convert to/from base64 string

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1511) Hive plan serialization is slow

2013-08-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737324#comment-13737324
 ] 

Ashutosh Chauhan commented on HIVE-1511:


Had a brief chat with [~kamrul] who expressed interest in working on this. 
Assigning it to him.

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Ning Zhang
 Attachments: HIVE-1511.patch, HIVE-1511-wip2.patch, 
 HIVE-1511-wip3.patch, HIVE-1511-wip.patch


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1511) Hive plan serialization is slow

2013-08-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1511:
---

Assignee: Mohammad Kamrul Islam

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Ning Zhang
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-1511.patch, HIVE-1511-wip2.patch, 
 HIVE-1511-wip3.patch, HIVE-1511-wip.patch


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5066) [WebHCat] Other code fixes for Windows

2013-08-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-5066:
-

Summary: [WebHCat] Other code fixes for Windows  (was: Other code fixes for 
Windows)

 [WebHCat] Other code fixes for Windows
 --

 Key: HIVE-5066
 URL: https://issues.apache.org/jira/browse/HIVE-5066
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0


 This is equivalent to HCATALOG-526, but updated to sync with latest trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5066) [WebHCat] Other code fixes for Windows

2013-08-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-5066:
-

Attachment: HIVE-5034-1.patch

 [WebHCat] Other code fixes for Windows
 --

 Key: HIVE-5066
 URL: https://issues.apache.org/jira/browse/HIVE-5066
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-5034-1.patch


 This is equivalent to HCATALOG-526, but updated to sync with latest trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5067) Add bzip compressor for ORC

2013-08-12 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-5067:
---

 Summary: Add bzip compressor for ORC
 Key: HIVE-5067
 URL: https://issues.apache.org/jira/browse/HIVE-5067
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley


It would be good to add a bzip compressor for ORC. Bzip does very well for long 
term/cold storage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5068) Some queries fail due to xml ecoder error

2013-08-12 Thread Brock Noland (JIRA)
Brock Noland created HIVE-5068:
--

 Summary: Some queries fail due to xml ecoder error
 Key: HIVE-5068
 URL: https://issues.apache.org/jira/browse/HIVE-5068
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland


Looks like something snuck in that breaks the JDK 7 build:

{noformat}
Caused by: java.lang.Exception: XMLEncoder: discarding statement 
ArrayList.add(ASTNode);
... 106 more
Caused by: java.lang.RuntimeException: Cannot serialize object
at 
org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
at 
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:238)
at 
java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400)
at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeExpression(Encoder.java:330)
at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeObject1(Encoder.java:258)
at java.beans.Encoder.cloneStatement(Encoder.java:271)
at java.beans.Encoder.writeStatement(Encoder.java:301)
at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400)
... 105 more
Caused by: java.lang.RuntimeException: Cannot serialize object
at 
org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
at java.beans.Encoder.getValue(Encoder.java:108)
at java.beans.Encoder.get(Encoder.java:252)
at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:112)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeExpression(Encoder.java:330)
at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
at 
java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeExpression(Encoder.java:330)
at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
at 
java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:232)
... 118 more
Caused by: java.lang.InstantiationException: org.antlr.runtime.CommonToken
at java.lang.Class.newInstance(Class.java:359)
at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at java.beans.Statement.invokeInternal(Statement.java:292)
at java.beans.Statement.access$000(Statement.java:58)
at java.beans.Statement$2.run(Statement.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at java.beans.Statement.invoke(Statement.java:182)
at java.beans.Expression.getValue(Expression.java:153)
at java.beans.Encoder.getValue(Encoder.java:105)
... 130 more
{noformat}

and

{noformat}
java.lang.RuntimeException: Cannot serialize object
at 
org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598)
at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:426)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:330)
at 
org.apache.hadoop.hive.ql.exec.Utilities.serializeObject(Utilities.java:611)
at org.apache.hadoop.hive.ql.plan.MapredWork.toXML(MapredWork.java:88)
at 
org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:505)
at 
org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:182)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
at 

[jira] [Created] (HIVE-5066) Other code fixes for Windows

2013-08-12 Thread Daniel Dai (JIRA)
Daniel Dai created HIVE-5066:


 Summary: Other code fixes for Windows
 Key: HIVE-5066
 URL: https://issues.apache.org/jira/browse/HIVE-5066
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0


This is equivalent to HCATALOG-526, but updated to sync with latest trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5003) Localize hive exec jar for tez

2013-08-12 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5003:
-

Attachment: HIVE-5003.5.patch.txt

Addressed Gunther's comments.

 Localize hive exec jar for tez
 --

 Key: HIVE-5003
 URL: https://issues.apache.org/jira/browse/HIVE-5003
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, 
 HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HIVE-5003.5.patch.txt, 
 HiveLocalizationDesign.txt


 Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and 
 added to vertices and the dag itself as needed. For hive we need to localize 
 the hive-exec.jar.
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request 13507: HIVE-5003: Localize hive exec jar for tez

2013-08-12 Thread Vikram Dixit Kumaraswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13507/
---

Review request for hive.


Bugs: HIVE-5003 and HIVE-5004
https://issues.apache.org/jira/browse/HIVE-5003
https://issues.apache.org/jira/browse/HIVE-5004


Repository: hive-git


Description
---

Tez localization of exec and additional jars.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 79c38c1 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 12e9334 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java faa99f7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java ac536e2 

Diff: https://reviews.apache.org/r/13507/diff/


Testing
---


Thanks,

Vikram Dixit Kumaraswamy



[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez

2013-08-12 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737384#comment-13737384
 ] 

Vikram Dixit K commented on HIVE-5003:
--

RB entry:

https://reviews.apache.org/r/13507/

 Localize hive exec jar for tez
 --

 Key: HIVE-5003
 URL: https://issues.apache.org/jira/browse/HIVE-5003
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, 
 HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HIVE-5003.5.patch.txt, 
 HiveLocalizationDesign.txt


 Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and 
 added to vertices and the dag itself as needed. For hive we need to localize 
 the hive-exec.jar.
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4899) Hive returns non-meanful error message for ill-formed fs.default.name

2013-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737399#comment-13737399
 ] 

Hudson commented on HIVE-4899:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #55 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/55/])
HIVE-4899 - Hive returns non-meanful error message for ill-formed 
fs.default.name (Xuefu Zhang, Reviewed By: Ashutosh Chauhan) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513229)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
* /hive/trunk/ql/src/test/results/clientnegative/fs_default_name1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/fs_default_name2.q.out


 Hive returns non-meanful error message for ill-formed fs.default.name
 -

 Key: HIVE-4899
 URL: https://issues.apache.org/jira/browse/HIVE-4899
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.10.0, 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4899.patch


 For query in test case fs_default_name1.q:
 {code}
 set fs.default.name='http://www.example.com;
 show tables;
 {code}
 The following error message is returned:
 {code}
 FAILED: IllegalArgumentException null
 {code}
 The message is not very meaningful, and has null in it.
 It would be better if we can provide detailed error message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737398#comment-13737398
 ] 

Hudson commented on HIVE-4123:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #55 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/55/])
HIVE-4123 Improved ORC integer RLE version 2. (Prasanth Jayachandran via 
omalley) (omalley: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513155)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* 
/hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerReader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerWriter.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.orig
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitPack.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestIntegerCompressionReader.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
* /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
* /hive/trunk/ql/src/test/resources/orc-file-dump.out


 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, 
 HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler

2013-08-12 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-2599:
---

Attachment: HIVE-2599.2.patch.txt

 Support Composit/Compound Keys with HBaseStorageHandler
 ---

 Key: HIVE-2599
 URL: https://issues.apache.org/jira/browse/HIVE-2599
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.8.0
Reporter: Hans Uhlig
Assignee: Swarnim Kulkarni
 Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, 
 HIVE-2599.2.patch.txt


 It would be really nice for hive to be able to understand composite keys from 
 an underlying HBase schema. Currently we have to store key fields twice to be 
 able to both key and make data available. I noticed John Sichi mentioned in 
 HIVE-1228 that this would be a separate issue but I cant find any follow up. 
 How feasible is this in the HBaseStorageHandler?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler

2013-08-12 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737401#comment-13737401
 ] 

Swarnim Kulkarni commented on HIVE-2599:


This should be ready for review. If someone has a chance to take a look, that 
will be great!

 Support Composit/Compound Keys with HBaseStorageHandler
 ---

 Key: HIVE-2599
 URL: https://issues.apache.org/jira/browse/HIVE-2599
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.8.0
Reporter: Hans Uhlig
Assignee: Swarnim Kulkarni
 Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, 
 HIVE-2599.2.patch.txt


 It would be really nice for hive to be able to understand composite keys from 
 an underlying HBase schema. Currently we have to store key fields twice to be 
 able to both key and make data available. I noticed John Sichi mentioned in 
 HIVE-1228 that this would be a separate issue but I cant find any follow up. 
 How feasible is this in the HBaseStorageHandler?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5058) Fix NPE issue with DAG submission in TEZ

2013-08-12 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5058:
-

Attachment: HIVE-5058.2.patch

Fixed one more issue resulting in NPE (localization of reduce plans was 
incorrect)

 Fix NPE issue with DAG submission in TEZ
 

 Key: HIVE-5058
 URL: https://issues.apache.org/jira/browse/HIVE-5058
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-5058.1.patch, HIVE-5058.2.patch


 Submitting dag caused NPE on execution.
 Multiple issues:
 - Some configs weren't set right
 - Key desc/Table desc weren't set properly
 - parallelism was left at -1
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3630) udf_substr.q fails when using JDK7

2013-08-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737427#comment-13737427
 ] 

Xuefu Zhang commented on HIVE-3630:
---

[~brocknoland] No. I meant HIVE-3630 is needed to allow JDK7 to pass. HIVE-3840 
addresses a different issue. The patch here probably needs to rebase because of 
changes introduced by HIVE-3840.

 udf_substr.q fails when using JDK7
 --

 Key: HIVE-3630
 URL: https://issues.apache.org/jira/browse/HIVE-3630
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.9.1, 0.10.0, 0.11.0
Reporter: Chris Drome
Assignee: Chris Drome
 Attachments: HIVE-3630-0.10.patch, HIVE-3630-0.9.patch, 
 HIVE-3630-trunk.patch


 Internal error: Cannot find ConstantObjectInspector for BINARY
 This exception has two causes.
 JDK7 iterators do not return values in the same order as JDK6, which selects 
 a different implementation of this UDF when the first argument is null. With 
 JDK7 this happens to be the binary version.
 The binary version is not implemented properly which ultimately causes the 
 exception when the method is called.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5023) Hive get wrong result when partition has the same path but different schema or authority

2013-08-12 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737426#comment-13737426
 ] 

Sushanth Sowmyan commented on HIVE-5023:


+1 on intent from looking at what the patch fixes. Haven't explicitly tested it 
myself.

 Hive get wrong result when partition has the same path but different schema 
 or authority
 

 Key: HIVE-5023
 URL: https://issues.apache.org/jira/browse/HIVE-5023
 Project: Hive
  Issue Type: Bug
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-5023.1.patch, HIVE-5023.2.patch


 Hive does not differentiate scheme and authority in file uris which cause 
 wrong result when partition has the same path but different schema or 
 authority. Here is a simple repro
 partition file path:
 asv://contain...@secondary1.blob.core.windows.net/2013-08-05/00/text1.txt
 with content 2013-08-05 00:00:00
 asv://contain...@secondary1.blob.core.windows.net/2013-08-05/00/text2.txt
 with content 2013-08-05 00:00:20
 {noformat}
 CREATE EXTERNAL TABLE IF NOT EXISTS T1 (t STRING) PARTITIONED BY (ProcessDate 
 STRING, Hour STRING, ClusterName STRING) ROW FORMAT DELIMITED FIELDS 
 TERMINATED by '\t' STORED AS TEXTFILE;
 ALTER TABLE T1 DROP IF EXISTS PARTITION(processDate='2013-08-05', Hour='00', 
 clusterName ='CLusterA');
 ALTER TABLE T1 ADD IF NOT EXISTS PARTITION(processDate='2013-08-05', 
 Hour='00', clusterName ='ClusterA') LOCATION 
 'asv://contain...@secondary1.blob.core.windows.net/2013-08-05/00';
 ALTER TABLE T1 DROP IF EXISTS PARTITION(processDate='2013-08-05', Hour='00', 
 clusterName ='ClusterB');
 ALTER TABLE T1 ADD IF NOT EXISTS PARTITION(processDate='2013-08-05', 
 Hour='00', clusterName ='ClusterB') LOCATION 
 'asv://contain...@secondary1.blob.core.windows.net/2013-08-05/00';
 {noformat}
 the expect output of the hive query
 {noformat}
 SELECT ClusterName, t FROM T1 WHERE ProcessDate=’2013-08-05’ AND Hour=’00’;
 {noformat}
 should be
 {noformat}
 ClusterA2013-08-05 00:00:00
 ClusterB2013-08-05 00:00:20
 {noformat}
 However it is
 {noformat}
 ClusterA2013-08-05 00:00:00
 ClusterA2013-08-05 00:00:20
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4778) hive.server2.authentication CUSTOM not working

2013-08-12 Thread Azrael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Azrael Park reassigned HIVE-4778:
-

Assignee: Azrael Park

 hive.server2.authentication CUSTOM not working
 --

 Key: HIVE-4778
 URL: https://issues.apache.org/jira/browse/HIVE-4778
 Project: Hive
  Issue Type: Bug
  Components: Authentication
Affects Versions: 0.11.0
 Environment: CentOS release 6.2 x86_64
 java version 1.6.0_31
 Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
Reporter: Zdenek Ott
Assignee: Azrael Park

 I have created my own class PamAuthenticationProvider that implements 
 PasswdAuthenticationProvider interface. I have puted jar into hive lib 
 directory and have configured hive-site.xml in following way:
 property
   namehive.server2.authentication/name
   valueCUSTOM/value
 /property
 property
   namehive.server2.custom.authentication.class/name
   valuecom.avast.ff.hive.PamAuthenticationProvider/value
 /property
 I use SQuireL and jdbc drivers to connect to hive. During authentication Hive 
 throws following exception:
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hive.service.auth.PasswdAuthenticationProvider.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
 at 
 org.apache.hive.service.auth.CustomAuthenticationProviderImpl.init(CustomAuthenticationProviderImpl.java:20)
 at 
 org.apache.hive.service.auth.AuthenticationProviderFactory.getAuthenticationProvider(AuthenticationProviderFactory.java:57)
 at 
 org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:61)
 at 
 org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:127)
 at 
 org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:509)
 at 
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:264)
 at 
 org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
 at 
 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hive.service.auth.PasswdAuthenticationProvider.init()
 at java.lang.Class.getConstructor0(Class.java:2706)
 at java.lang.Class.getDeclaredConstructor(Class.java:1985)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122)
 ... 12 more
 I have done small patch for 
 org.apache.hive.service.auth.CustomAuthenticationProviderImpl , that have 
 solved my problem, but I'm not sure if it's the best solution. Here is the 
 patch:
 --- CustomAuthenticationProviderImpl.java   2013-06-20 14:55:22.473995184 
 +0200
 +++ CustomAuthenticationProviderImpl.java.new   2013-06-20 14:57:36.549012966 
 +0200
 @@ -33,7 +33,7 @@
  HiveConf conf = new HiveConf();
  this.customHandlerClass = (Class? extends PasswdAuthenticationProvider)
  conf.getClass(
 -
 HiveConf.ConfVars.HIVE_SERVER2_CUSTOM_AUTHENTICATION_CLASS.name(),
 +
 HiveConf.ConfVars.HIVE_SERVER2_CUSTOM_AUTHENTICATION_CLASS.varname,
  PasswdAuthenticationProvider.class);
  this.customProvider =
  ReflectionUtils.newInstance(this.customHandlerClass, conf);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3630) udf_substr.q fails when using JDK7

2013-08-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737448#comment-13737448
 ] 

Brock Noland commented on HIVE-3630:


[~xuefuz] udf_substr.q does not fail on JDK7 for me. I think we can close this.

 udf_substr.q fails when using JDK7
 --

 Key: HIVE-3630
 URL: https://issues.apache.org/jira/browse/HIVE-3630
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.9.1, 0.10.0, 0.11.0
Reporter: Chris Drome
Assignee: Chris Drome
 Attachments: HIVE-3630-0.10.patch, HIVE-3630-0.9.patch, 
 HIVE-3630-trunk.patch


 Internal error: Cannot find ConstantObjectInspector for BINARY
 This exception has two causes.
 JDK7 iterators do not return values in the same order as JDK6, which selects 
 a different implementation of this UDF when the first argument is null. With 
 JDK7 this happens to be the binary version.
 The binary version is not implemented properly which ultimately causes the 
 exception when the method is called.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3630) udf_substr.q fails when using JDK7

2013-08-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737450#comment-13737450
 ] 

Xuefu Zhang commented on HIVE-3630:
---

[~brocknoland] Okay. Feel free to close it if it's no longer reproducible. It 
was there a couple months back.

 udf_substr.q fails when using JDK7
 --

 Key: HIVE-3630
 URL: https://issues.apache.org/jira/browse/HIVE-3630
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.9.1, 0.10.0, 0.11.0
Reporter: Chris Drome
Assignee: Chris Drome
 Attachments: HIVE-3630-0.10.patch, HIVE-3630-0.9.patch, 
 HIVE-3630-trunk.patch


 Internal error: Cannot find ConstantObjectInspector for BINARY
 This exception has two causes.
 JDK7 iterators do not return values in the same order as JDK6, which selects 
 a different implementation of this UDF when the first argument is null. With 
 JDK7 this happens to be the binary version.
 The binary version is not implemented properly which ultimately causes the 
 exception when the method is called.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 13507: HIVE-5003: Localize hive exec jar for tez

2013-08-12 Thread Gunther Hagleitner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13507/#review25040
---

Ship it!


Ship It!

- Gunther Hagleitner


On Aug. 12, 2013, 9:38 p.m., Vikram Dixit Kumaraswamy wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/13507/
 ---
 
 (Updated Aug. 12, 2013, 9:38 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-5003 and HIVE-5004
 https://issues.apache.org/jira/browse/HIVE-5003
 https://issues.apache.org/jira/browse/HIVE-5004
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez localization of exec and additional jars.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 79c38c1 
   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 12e9334 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java faa99f7 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java ac536e2 
 
 Diff: https://reviews.apache.org/r/13507/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Vikram Dixit Kumaraswamy
 




Re: Review Request 13507: HIVE-5003: Localize hive exec jar for tez

2013-08-12 Thread Gunther Hagleitner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13507/#review25041
---

Ship it!


Ship It!

- Gunther Hagleitner


On Aug. 12, 2013, 9:38 p.m., Vikram Dixit Kumaraswamy wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/13507/
 ---
 
 (Updated Aug. 12, 2013, 9:38 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-5003 and HIVE-5004
 https://issues.apache.org/jira/browse/HIVE-5003
 https://issues.apache.org/jira/browse/HIVE-5004
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Tez localization of exec and additional jars.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 79c38c1 
   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 12e9334 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java faa99f7 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java ac536e2 
 
 Diff: https://reviews.apache.org/r/13507/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Vikram Dixit Kumaraswamy
 




[jira] [Resolved] (HIVE-3630) udf_substr.q fails when using JDK7

2013-08-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved HIVE-3630.


Resolution: Cannot Reproduce

I am unable to reproduce despite repeated efforts. It seems something else 
fixed this therefore I am marking resolved. Please re-open if required.

 udf_substr.q fails when using JDK7
 --

 Key: HIVE-3630
 URL: https://issues.apache.org/jira/browse/HIVE-3630
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.9.1, 0.10.0, 0.11.0
Reporter: Chris Drome
Assignee: Chris Drome
 Attachments: HIVE-3630-0.10.patch, HIVE-3630-0.9.patch, 
 HIVE-3630-trunk.patch


 Internal error: Cannot find ConstantObjectInspector for BINARY
 This exception has two causes.
 JDK7 iterators do not return values in the same order as JDK6, which selects 
 a different implementation of this UDF when the first argument is null. With 
 JDK7 this happens to be the binary version.
 The binary version is not implemented properly which ultimately causes the 
 exception when the method is called.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3630) udf_substr.q fails when using JDK7

2013-08-12 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737467#comment-13737467
 ] 

Chris Drome commented on HIVE-3630:
---

Sorry for jumping into the discussion late. Feel free to close this if it is no 
longer reproducible ([~ashutoshc]] thought that would be the case after 
HIVE-3840).

 udf_substr.q fails when using JDK7
 --

 Key: HIVE-3630
 URL: https://issues.apache.org/jira/browse/HIVE-3630
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.9.1, 0.10.0, 0.11.0
Reporter: Chris Drome
Assignee: Chris Drome
 Attachments: HIVE-3630-0.10.patch, HIVE-3630-0.9.patch, 
 HIVE-3630-trunk.patch


 Internal error: Cannot find ConstantObjectInspector for BINARY
 This exception has two causes.
 JDK7 iterators do not return values in the same order as JDK6, which selects 
 a different implementation of this UDF when the first argument is null. With 
 JDK7 this happens to be the binary version.
 The binary version is not implemented properly which ultimately causes the 
 exception when the method is called.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3688) Various tests failing in TestNegativeCliDriver, TestParseNegative, TestParse when using JDK7

2013-08-12 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737469#comment-13737469
 ] 

Chris Drome commented on HIVE-3688:
---

[~brocknoland] that would be great. I'll remove the TestParse parts of this 
patch and resubmit for the TestNegativeCliDriver cases only. Thanks.

 Various tests failing in TestNegativeCliDriver, TestParseNegative, TestParse 
 when using JDK7
 

 Key: HIVE-3688
 URL: https://issues.apache.org/jira/browse/HIVE-3688
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.9.1, 0.10.0
Reporter: Chris Drome
Assignee: Chris Drome
 Attachments: HIVE-3688-0.9.patch, HIVE-3688-trunk.patch


 The following tests are failing when using JDK7.
 TestNegativeCliDriver:
 case_sensitivity.q
 cast1.q
 groupby1.q
 groupby2.q
 groupby3.q
 groupby4.q
 groupby5.q
 groupby6.q
 input1.q
 input2.q
 input20.q
 input3.q
 input4.q
 input5.q
 input6.q
 input7.q
 input8.q
 input9.q
 input_part1.q
 input_testsequencefile.q
 input_testxpath.q
 input_testxpath2.q
 join1.q
 join2.q
 join3.q
 join4.q
 join5.q
 join6.q
 join7.q
 join8.q
 sample1.q
 sample2.q
 sample3.q
 sample4.q
 sample5.q
 sample6.q
 sample7.q
 subq.q
 udf1.q
 udf4.q
 udf6.q
 udf_case.q
 udf_when.q
 union.q
 TestParseNegative:
 invalid_function_param2.q
 TestNegativeCliDriver:
 fs_default_name1.q.out_0.23_1.7
 fs_default_name2.q.out_0.23_1.7
 invalid_cast_from_binary_1.q.out_0.23_1.7
 invalid_cast_from_binary_2.q.out_0.23_1.7
 invalid_cast_from_binary_3.q.out_0.23_1.7
 invalid_cast_from_binary_4.q.out_0.23_1.7
 invalid_cast_from_binary_5.q.out_0.23_1.7
 invalid_cast_from_binary_6.q.out_0.23_1.7
 wrong_column_type.q.out_0.23_1.7

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-08-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737471#comment-13737471
 ] 

Ashutosh Chauhan commented on HIVE-4838:


Good work Brock. Left some comments on phabricator. Another question is it 
seems like there are few file mvs? To preserve history how shall we proceed 
about applying this patch on trunk.

 Refactor MapJoin HashMap code to improve testability and readability
 

 Key: HIVE-4838
 URL: https://issues.apache.org/jira/browse/HIVE-4838
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, 
 HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch


 MapJoin is an essential component for high performance joins in Hive and the 
 current code has done great service for many years. However, the code is 
 showing it's age and currently suffers  from the following issues:
 * Uses static state via the MapJoinMetaData class to pass serialization 
 metadata to the Key, Row classes.
 * The api of a logical Table Container is not defined and therefore it's 
 unclear what apis HashMapWrapper 
 needs to publicize. Additionally HashMapWrapper has many used public methods.
 * HashMapWrapper contains logic to serialize, test memory bounds, and 
 implement the table container. Ideally these logical units could be seperated
 * HashTableSinkObjectCtx has unused fields and unused methods
 * CommonJoinOperator and children use ArrayList on left hand side when only 
 List is required
 * There are unused classes MRU, DCLLItemm and classes which duplicate 
 functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-08-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737482#comment-13737482
 ] 

Brock Noland commented on HIVE-4838:


Sounds good, I will address them. In regards to the moves, I don't believe 
there are any true mv's. MapJoinObjectKey - MapJoinKey is kind of a move but 
I'd say it's more of complete re-implementation.

 Refactor MapJoin HashMap code to improve testability and readability
 

 Key: HIVE-4838
 URL: https://issues.apache.org/jira/browse/HIVE-4838
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, 
 HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch


 MapJoin is an essential component for high performance joins in Hive and the 
 current code has done great service for many years. However, the code is 
 showing it's age and currently suffers  from the following issues:
 * Uses static state via the MapJoinMetaData class to pass serialization 
 metadata to the Key, Row classes.
 * The api of a logical Table Container is not defined and therefore it's 
 unclear what apis HashMapWrapper 
 needs to publicize. Additionally HashMapWrapper has many used public methods.
 * HashMapWrapper contains logic to serialize, test memory bounds, and 
 implement the table container. Ideally these logical units could be seperated
 * HashTableSinkObjectCtx has unused fields and unused methods
 * CommonJoinOperator and children use ArrayList on left hand side when only 
 List is required
 * There are unused classes MRU, DCLLItemm and classes which duplicate 
 functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >