[jira] [Commented] (HIVE-4246) Implement predicate pushdown for ORC

2013-08-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731681#comment-13731681
 ] 

Gopal V commented on HIVE-4246:
---

The IN() implementation does a linear search on the predicate leaves right now.

Since we are only checking range  not actual membership, it would be better to 
store it as a sorted list and perform a bin search.

In most cases this will enable a fast path for the list's min/max. 

But in the corner case we'll get a case where the bin search inserts min  max 
at the same location  matches no element, then we can skip the block.

 Implement predicate pushdown for ORC
 

 Key: HIVE-4246
 URL: https://issues.apache.org/jira/browse/HIVE-4246
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-4246.D11415.1.patch


 By using the push down predicates from the table scan operator, ORC can skip 
 over 10,000 rows at a time that won't satisfy the predicate. This will help a 
 lot, especially if the file is sorted by the column that is used in the 
 predicate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4531) [WebHCat] Collecting task logs to hdfs

2013-08-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4531:
-

Attachment: HIVE-4531-6.patch

HIVE-4531-6.patch resync with trunk. e2e test will be in a follow up Jira.

 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, 
 samplestatusdirwithlist.tar.gz


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4531) [WebHCat] Collecting task logs to hdfs

2013-08-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4531:
-

Attachment: HIVE-4531-6.patch

 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, 
 samplestatusdirwithlist.tar.gz


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4531) [WebHCat] Collecting task logs to hdfs

2013-08-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4531:
-

Attachment: (was: HIVE-4531-6.patch)

 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, 
 samplestatusdirwithlist.tar.gz


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5017) DBTokenStore gives compiler warnings

2013-08-07 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5017:
-

Attachment: HIVE-5017.1.patch

 DBTokenStore gives compiler warnings
 

 Key: HIVE-5017
 URL: https://issues.apache.org/jira/browse/HIVE-5017
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.12.0

 Attachments: HIVE-5017.1.patch


 The Method.invoke call in 2 cases is done via (Object[])null but empty Object 
 array will shut up the compiler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5017) DBTokenStore gives compiler warnings

2013-08-07 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5017:
-

Status: Patch Available  (was: Open)

 DBTokenStore gives compiler warnings
 

 Key: HIVE-5017
 URL: https://issues.apache.org/jira/browse/HIVE-5017
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.12.0

 Attachments: HIVE-5017.1.patch


 The Method.invoke call in 2 cases is done via (Object[])null but empty Object 
 array will shut up the compiler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5017) DBTokenStore gives compiler warnings

2013-08-07 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-5017:


 Summary: DBTokenStore gives compiler warnings
 Key: HIVE-5017
 URL: https://issues.apache.org/jira/browse/HIVE-5017
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.12.0
 Attachments: HIVE-5017.1.patch

The Method.invoke call in 2 cases is done via (Object[])null but empty Object 
array will shut up the compiler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)
Benjamin Jakobus created HIVE-5018:
--

 Summary: Avoiding object instantiation in loops (issue 6)
 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor


java/org/apache/hadoop/hive/ql/Context.java
java/org/apache/hadoop/hive/ql/Driver.java
java/org/apache/hadoop/hive/ql/QueryPlan.java
java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
java/org/apache/hadoop/hive/ql/exec/MapOperator.java
java/org/apache/hadoop/hive/ql/exec/MoveTask.java
java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
java/org/apache/hadoop/hive/ql/exec/StatsTask.java
java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
java/org/apache/hadoop/hive/ql/exec/Utilities.java
java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
java/org/apache/hadoop/hive/ql/history/HiveHistory.java
java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
java/org/apache/hadoop/hive/ql/io/RCFile.java
java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java
java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateMapper.java
java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java
java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
java/org/apache/hadoop/hive/ql/metadata/Hive.java
java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java
java/org/apache/hadoop/hive/ql/metadata/formatting/JsonMetaDataFormatter.java
java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java

[jira] [Commented] (HIVE-5009) Fix minor optimization issues

2013-08-07 Thread Benjamin Jakobus (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731783#comment-13731783
 ] 

Benjamin Jakobus commented on HIVE-5009:


ok, thanks.

 Fix minor optimization issues
 -

 Key: HIVE-5009
 URL: https://issues.apache.org/jira/browse/HIVE-5009
 Project: Hive
  Issue Type: Improvement
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

   Original Estimate: 48h
  Remaining Estimate: 48h

 I have found some minor optimization issues in the codebase, which I would 
 like to rectify and contribute. Specifically, these are:
 The optimizations that could be applied to Hive's code base are as follows:
 1. Use StringBuffer when appending strings - In 184 instances, the 
 concatination operator (+=) was used when appending strings. This is 
 inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
 should be used. 12 instances of this optimization can be applied to the 
 GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
 uses the + operator inside a loop, so does the column projection utilities 
 class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
 Tests showed that using the StringBuilder when appending strings is 57\% 
 faster than using the + operator (using the StringBuffer took 122 
 milliseconds whilst the + operator took 284 milliseconds). The reason as to 
 why using the StringBuffer class is preferred over using the + operator, is 
 because
 String third = first + second;
 gets compiled to:
 StringBuilder builder = new StringBuilder( first );
 builder.append( second );
 third = builder.toString();
 Therefore, when building complex strings, that, for example involve loops, 
 require many instantiations (and as discussed below, creating new objects 
 inside loops is inefficient).
 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
 is a more efficient at creating  creating lists from arrays than using loops 
 to manually iterate over the elements (using asList is computationally very 
 cheap, O(1), as it merely creates a wrapper object around the array; looping 
 through the list however has a complexity of O(n) since a new list is created 
 and every element in the array is added to this new list). As confirmed by 
 the experiment detailed in Appendix D, the Java compiler does not 
 automatically optimize and replace tight-loop copying with asList: the 
 loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
 instant. 
 Four instances of this optimization can be applied to Hive's codebase (two of 
 these should be applied to the Map-Join container - MapJoinRowContainer) - 
 lines 92 to 98:
  for (obj = other.first(); obj != null; obj = other.next()) {
   ArrayListObject ele = new ArrayList(obj.length);
   for (int i = 0; i  obj.length; i++) {
 ele.add(obj[i]);
   }
   list.add((Row) ele);
 }
 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
 could be avoided by simply using the provided static conversion methods. As 
 noted in the PMD documentation, using these avoids the cost of creating 
 objects that also need to be garbage-collected later.
 For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
 more efficient parseDouble method call:
 // Inefficient:
 Double percent = Double.valueOf(value).doubleValue();
 // To be replaced by:
 Double percent = Double.parseDouble(value);
 Our test case in Appendix D confirms this: converting 10,000 strings into 
 integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
 unnecessary wrapper object) took 119 on average; using parseInt() took only 
 38. Therefore creating even just one unnecessary wrapper object can make your 
 code up to 68% slower.
 4. Converting literals to strings using +  - Converting literals to strings 
 using +  is quite inefficient (see Appendix D) and should be done by 
 calling the toString() method instead: converting 1,000,000 integers to 
 strings using +  took, on average, 1340 milliseconds whilst using the 
 toString() method only required 1183 milliseconds (hence adding empty strings 
 takes nearly 12% more time). 
 89 instances of this using +  when converting literals were found in Hive's 
 codebase - one of these are found in the JoinUtil.
 5. Avoid manual copying of arrays - Instead of copying arrays as is done in 
 GroupByOperator on line 1040 (see below), the more efficient System.arraycopy 
 can be used (arraycopy is a native method meaning that the entire memory 
 block is copied using memcpy or mmove).
 // Line 1040 of the GroupByOperator
 for (int i = 0; i  keys.length; i++) {
   forwardCache[i] = keys[i];
 }  

[jira] [Updated] (HIVE-5009) Fix minor optimization issues

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5009:
---

Description: 
I have found some minor optimization issues in the codebase, which I would like 
to rectify and contribute. Specifically, these are:

The optimizations that could be applied to Hive's code base are as follows:

1. Use StringBuffer when appending strings - In 184 instances, the 
concatination operator (+=) was used when appending strings. This is inherintly 
inefficient - instead Java's StringBuffer or StringBuilder class should be 
used. 12 instances of this optimization can be applied to the 
GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver uses 
the + operator inside a loop, so does the column projection utilities class 
(ColumnProjectionUtils) and the aforementioned skew-join processor. Tests 
showed that using the StringBuilder when appending strings is 57\% faster than 
using the + operator (using the StringBuffer took 122 milliseconds whilst the + 
operator took 284 milliseconds). The reason as to why using the StringBuffer 
class is preferred over using the + operator, is because

String third = first + second;

gets compiled to:

StringBuilder builder = new StringBuilder( first );
builder.append( second );
third = builder.toString();

Therefore, when building complex strings, that, for example involve loops, 
require many instantiations (and as discussed below, creating new objects 
inside loops is inefficient).


2. Use arrays instead of List - Java's java.util.Arrays class asList method is 
a more efficient at creating  creating lists from arrays than using loops to 
manually iterate over the elements (using asList is computationally very cheap, 
O(1), as it merely creates a wrapper object around the array; looping through 
the list however has a complexity of O(n) since a new list is created and every 
element in the array is added to this new list). As confirmed by the experiment 
detailed in Appendix D, the Java compiler does not automatically optimize and 
replace tight-loop copying with asList: the loop-copying of 1,000,000 items 
took 15 milliseconds whilst using asList is instant. 

Four instances of this optimization can be applied to Hive's codebase (two of 
these should be applied to the Map-Join container - MapJoinRowContainer) - 
lines 92 to 98:

 for (obj = other.first(); obj != null; obj = other.next()) {
  ArrayListObject ele = new ArrayList(obj.length);
  for (int i = 0; i  obj.length; i++) {
ele.add(obj[i]);
  }
  list.add((Row) ele);
}


3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
could be avoided by simply using the provided static conversion methods. As 
noted in the PMD documentation, using these avoids the cost of creating 
objects that also need to be garbage-collected later.

For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
more efficient parseDouble method call:

// Inefficient:
Double percent = Double.valueOf(value).doubleValue();
// To be replaced by:
Double percent = Double.parseDouble(value);


Our test case in Appendix D confirms this: converting 10,000 strings into 
integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
unnecessary wrapper object) took 119 on average; using parseInt() took only 38. 
Therefore creating even just one unnecessary wrapper object can make your code 
up to 68% slower.

4. Converting literals to strings using +  - Converting literals to strings 
using +  is quite inefficient (see Appendix D) and should be done by calling 
the toString() method instead: converting 1,000,000 integers to strings using + 
 took, on average, 1340 milliseconds whilst using the toString() method only 
required 1183 milliseconds (hence adding empty strings takes nearly 12% more 
time). 

89 instances of this using +  when converting literals were found in Hive's 
codebase - one of these are found in the JoinUtil.

5. Avoid manual copying of arrays - Instead of copying arrays as is done in 
GroupByOperator on line 1040 (see below), the more efficient System.arraycopy 
can be used (arraycopy is a native method meaning that the entire memory block 
is copied using memcpy or mmove).

// Line 1040 of the GroupByOperator
for (int i = 0; i  keys.length; i++) {
forwardCache[i] = keys[i];
}   

Using System.arraycopy on an array of 10,000 strings was (close to) instant 
whilst the manual copy took 6 milliseconds.
11 instances of this optimization should be applied to the Hive codebase.

6. Avoiding instantiation inside loops - As noted in the PMD documentation, 
new objects created within loops should be checked to see if they can created 
outside them and reused.. 

Declaring variables inside a loop (i from 0 to 10,000) took 300 milliseconds
whilst declaring them outside took only 88 milliseconds (this can be 

[jira] [Created] (HIVE-5019) Use StringBuffer instead of += (issue 1)

2013-08-07 Thread Benjamin Jakobus (JIRA)
Benjamin Jakobus created HIVE-5019:
--

 Summary: Use StringBuffer instead of += (issue 1)
 Key: HIVE-5019
 URL: https://issues.apache.org/jira/browse/HIVE-5019
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus



Issue 1 (use of StringBuffer over +=)
java/org/apache/hadoop/hive/ql/Driver.java
java/org/apache/hadoop/hive/ql/Driver.java
java/org/apache/hadoop/hive/ql/QueryPlan.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/DDLTask.java
java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java
java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java
java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateTask.java
java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateTask.java
java/org/apache/hadoop/hive/ql/lib/RuleExactMatch.java
java/org/apache/hadoop/hive/ql/lib/RuleRegExp.java
java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java
java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java
java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java
java/org/apache/hadoop/hive/ql/metadata/Partition.java
java/org/apache/hadoop/hive/ql/metadata/Table.java
java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java
java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndex.java
java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java

[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-07 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731794#comment-13731794
 ] 

Prasanth J commented on HIVE-4123:
--

Code comment improvement/fixes, removed some redundant code, long repeat runs 
will directly use DELTA encoding instead of calling determineEncoding() 
function and few more changes added.

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-07 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-4123:
-

Attachment: HIVE-4123.6.txt

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5009) Fix minor optimization issues

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5009:
---

Attachment: AbstractBucketJoinProc.java

 Fix minor optimization issues
 -

 Key: HIVE-5009
 URL: https://issues.apache.org/jira/browse/HIVE-5009
 Project: Hive
  Issue Type: Improvement
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractBucketJoinProc.java

   Original Estimate: 48h
  Remaining Estimate: 48h

 I have found some minor optimization issues in the codebase, which I would 
 like to rectify and contribute. Specifically, these are:
 The optimizations that could be applied to Hive's code base are as follows:
 1. Use StringBuffer when appending strings - In 184 instances, the 
 concatination operator (+=) was used when appending strings. This is 
 inherintly inefficient - instead Java's StringBuffer or StringBuilder class 
 should be used. 12 instances of this optimization can be applied to the 
 GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver 
 uses the + operator inside a loop, so does the column projection utilities 
 class (ColumnProjectionUtils) and the aforementioned skew-join processor. 
 Tests showed that using the StringBuilder when appending strings is 57\% 
 faster than using the + operator (using the StringBuffer took 122 
 milliseconds whilst the + operator took 284 milliseconds). The reason as to 
 why using the StringBuffer class is preferred over using the + operator, is 
 because
 String third = first + second;
 gets compiled to:
 StringBuilder builder = new StringBuilder( first );
 builder.append( second );
 third = builder.toString();
 Therefore, when building complex strings, that, for example involve loops, 
 require many instantiations (and as discussed below, creating new objects 
 inside loops is inefficient).
 2. Use arrays instead of List - Java's java.util.Arrays class asList method 
 is a more efficient at creating  creating lists from arrays than using loops 
 to manually iterate over the elements (using asList is computationally very 
 cheap, O(1), as it merely creates a wrapper object around the array; looping 
 through the list however has a complexity of O(n) since a new list is created 
 and every element in the array is added to this new list). As confirmed by 
 the experiment detailed in Appendix D, the Java compiler does not 
 automatically optimize and replace tight-loop copying with asList: the 
 loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is 
 instant. 
 Four instances of this optimization can be applied to Hive's codebase (two of 
 these should be applied to the Map-Join container - MapJoinRowContainer) - 
 lines 92 to 98:
  for (obj = other.first(); obj != null; obj = other.next()) {
   ArrayListObject ele = new ArrayList(obj.length);
   for (int i = 0; i  obj.length; i++) {
 ele.add(obj[i]);
   }
   list.add((Row) ele);
 }
 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation 
 could be avoided by simply using the provided static conversion methods. As 
 noted in the PMD documentation, using these avoids the cost of creating 
 objects that also need to be garbage-collected later.
 For example, line 587 of the SemanticAnalyzer class, could be replaced by the 
 more efficient parseDouble method call:
 // Inefficient:
 Double percent = Double.valueOf(value).doubleValue();
 // To be replaced by:
 Double percent = Double.parseDouble(value);
 Our test case in Appendix D confirms this: converting 10,000 strings into 
 integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an 
 unnecessary wrapper object) took 119 on average; using parseInt() took only 
 38. Therefore creating even just one unnecessary wrapper object can make your 
 code up to 68% slower.
 4. Converting literals to strings using +  - Converting literals to strings 
 using +  is quite inefficient (see Appendix D) and should be done by 
 calling the toString() method instead: converting 1,000,000 integers to 
 strings using +  took, on average, 1340 milliseconds whilst using the 
 toString() method only required 1183 milliseconds (hence adding empty strings 
 takes nearly 12% more time). 
 89 instances of this using +  when converting literals were found in Hive's 
 codebase - one of these are found in the JoinUtil.
 5. Avoid manual copying of arrays - Instead of copying arrays as is done in 
 GroupByOperator on line 1040 (see below), the more efficient System.arraycopy 
 can be used (arraycopy is a native method meaning that the entire memory 
 block is copied using memcpy or mmove).
 // Line 1040 of the GroupByOperator
 for (int i = 0; i  keys.length; i++) {
   forwardCache[i] 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: AbstractGenericUDFEWAHBitmapBop.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
 java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
 java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
 java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java
 java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateMapper.java
 java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: BaseSemanticAnalyzer.java
AbstractSMBJoinProc.java
AbstractJoinTaskDispatcher.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
 java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
 java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
 java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: (was: BaseSemanticAnalyzer.java)

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
 java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
 java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
 java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java
 java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateMapper.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: BaseSemanticAnalyzer.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
 java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
 java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
 java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java
 java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateMapper.java
 

[jira] [Commented] (HIVE-3363) Special characters (such as 'é') displayed as '?' in Hive

2013-08-07 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731825#comment-13731825
 ] 

Kousuke Saruta commented on HIVE-3363:
--

I think, this problem may be similar to HIVE-2137.
In SQLOperation.java, getNextRowSet() have a bunch of code as follows,
{code}

  for (String rowString : rows) {
rowObj = serde.deserialize(new BytesWritable(rowString.getBytes()));
for (int i = 0; i  fieldRefs.size(); i++) {
  StructField fieldRef = fieldRefs.get(i);
  fieldOI = fieldRef.getFieldObjectInspector();
  deserializedFields[i] = 
convertLazyToJava(soi.getStructFieldData(rowObj, fieldRef), fieldOI);
}
rowSet.addRow(resultSchema, deserializedFields);
  }
{code}

The code above use getBytes() without setting encoding so it will use system 
default encoding.
If the front end of hive is used in Windows, encoding mismatch will happen 
because Hive(Hadoop) expects UTF-8 for their character encoding but Windows use 
Shift_JIS.
So, I think the code above should be as follows

{code}

  for (String rowString : rows) {
rowObj = serde.deserialize(new 
BytesWritable(rowString.getBytes(UTF-8)));
for (int i = 0; i  fieldRefs.size(); i++) {
  StructField fieldRef = fieldRefs.get(i);
  fieldOI = fieldRef.getFieldObjectInspector();
  deserializedFields[i] = 
convertLazyToJava(soi.getStructFieldData(rowObj, fieldRef), fieldOI);
}
rowSet.addRow(resultSchema, deserializedFields);
  }
{code}

 Special characters (such as 'é') displayed as '?' in Hive
 -

 Key: HIVE-3363
 URL: https://issues.apache.org/jira/browse/HIVE-3363
 Project: Hive
  Issue Type: Bug
Reporter: Anand Balaraman

 I am facing an issue while viewing special characters (such as é) using Hive.
 If I view the file in HDFS (using hadoop fs -cat command), it is displayed 
 correctly as ’é’, but when I select the data using Hive, this character alone 
 gets replaced by a question mark.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: BlockMergeTask.java
BitmapIndexHandler.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
 java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
 java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
 java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java
 

[jira] [Updated] (HIVE-3363) Special characters (such as 'é') displayed as '?' in Hive

2013-08-07 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-3363:
-

Attachment: HIVE-3363.patch

Initial patch.

 Special characters (such as 'é') displayed as '?' in Hive
 -

 Key: HIVE-3363
 URL: https://issues.apache.org/jira/browse/HIVE-3363
 Project: Hive
  Issue Type: Bug
Reporter: Anand Balaraman
 Attachments: HIVE-3363.patch


 I am facing an issue while viewing special characters (such as é) using Hive.
 If I view the file in HDFS (using hadoop fs -cat command), it is displayed 
 correctly as ’é’, but when I select the data using Hive, this character alone 
 gets replaced by a question mark.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: BucketingSortingOpProcFactory.java
BucketingSortingInferenceOptimizer.java
BucketMapJoinContext.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketMapJoinContext.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: BucketingSortingReduceSinkOptimizer.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketMapJoinContext.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
 java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: BucketizedHiveInputFormat.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
 java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: ColumnPrunerProcFactory.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: ColumnPrunerProcFactory.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnPrunerProcFactory.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
 

[jira] [Commented] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time

2013-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731853#comment-13731853
 ] 

Hive QA commented on HIVE-4233:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12596469/HIVE-4233.5.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2767 tests executed
*Failed tests:*
{noformat}
org.apache.hcatalog.pig.TestHCatStorerMulti.testStorePartitionedTable
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/327/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/327/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 The TGT gotten from class 'CLIService'  should be renewed on time
 -

 Key: HIVE-4233
 URL: https://issues.apache.org/jira/browse/HIVE-4233
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
 Environment: CentOS release 6.3 (Final)
 jdk1.6.0_31
 HiveServer2  0.10.0-cdh4.2.0
 Kerberos Security 
Reporter: Dongyong Wang
Assignee: Thejas M Nair
Priority: Critical
 Attachments: 0001-FIX-HIVE-4233.patch, HIVE-4233-2.patch, 
 HIVE-4233-3.patch, HIVE-4233.4.patch, HIVE-4233.5.patch


 When the HIveServer2 have started more than 7 days, I use beeline  shell  to  
 connect the HiveServer2,all operation failed.
 The log of HiveServer2 shows it was caused by the Kerberos auth failure,the 
 exception stack trace is:
 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: 
 java.lang.RuntimeException: Unable to instantiate 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275)
 at 
 org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358)
 at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown 
 Source)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082)
 ... 16 more
 Caused by: java.lang.IllegalStateException: This ticket is no longer valid
 at 
 javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601)
 at java.lang.String.valueOf(String.java:2826)
 at java.lang.StringBuilder.append(StringBuilder.java:115)
 at 
 sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120)
 at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41)
 at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130)
 at 
 sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328)
 at java.security.AccessController.doPrivileged(Native Method)
 at 
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: ColumnStatsTask.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnPrunerProcFactory.java, ColumnStatsTask.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: CombineHiveInputFormat.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnPrunerProcFactory.java, ColumnStatsTask.java, 
 CombineHiveInputFormat.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: CommonJoinOperator.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnStatsTask.java, CombineHiveInputFormat.java, CommonJoinOperator.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: (was: ColumnPrunerProcFactory.java)

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnStatsTask.java, CombineHiveInputFormat.java, CommonJoinOperator.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
 java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: ConditionalResolverCommonJoin.java
CommonJoinTaskDispatcher.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnStatsTask.java, CombineHiveInputFormat.java, CommonJoinOperator.java, 
 CommonJoinTaskDispatcher.java, ConditionalResolverCommonJoin.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java
 java/org/apache/hadoop/hive/ql/io/orc/FileDump.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: DDLSemanticAnalyzer.java
CorrelationOptimizer.java
Context.java
ConditionalResolverSkewJoin.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnStatsTask.java, CombineHiveInputFormat.java, CommonJoinOperator.java, 
 CommonJoinTaskDispatcher.java, ConditionalResolverCommonJoin.java, 
 ConditionalResolverSkewJoin.java, Context.java, CorrelationOptimizer.java, 
 DDLSemanticAnalyzer.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 

[jira] [Updated] (HIVE-3363) Special characters (such as 'é') displayed as '?' in Hive

2013-08-07 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-3363:
-

Affects Version/s: 0.12.0
   Status: Patch Available  (was: Open)

 Special characters (such as 'é') displayed as '?' in Hive
 -

 Key: HIVE-3363
 URL: https://issues.apache.org/jira/browse/HIVE-3363
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Anand Balaraman
 Attachments: HIVE-3363.patch


 I am facing an issue while viewing special characters (such as é) using Hive.
 If I view the file in HDFS (using hadoop fs -cat command), it is displayed 
 correctly as ’é’, but when I select the data using Hive, this character alone 
 gets replaced by a question mark.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5017) DBTokenStore gives compiler warnings

2013-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731883#comment-13731883
 ] 

Hive QA commented on HIVE-5017:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12596531/HIVE-5017.1.patch

{color:green}SUCCESS:{color} +1 2767 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/329/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/329/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 DBTokenStore gives compiler warnings
 

 Key: HIVE-5017
 URL: https://issues.apache.org/jira/browse/HIVE-5017
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.12.0

 Attachments: HIVE-5017.1.patch


 The Method.invoke call in 2 cases is done via (Object[])null but empty Object 
 array will shut up the compiler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3363) Special characters (such as 'é') displayed as '?' in Hive

2013-08-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731917#comment-13731917
 ] 

Hive QA commented on HIVE-3363:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12596565/HIVE-3363.patch

{color:green}SUCCESS:{color} +1 2767 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/331/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/331/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Special characters (such as 'é') displayed as '?' in Hive
 -

 Key: HIVE-3363
 URL: https://issues.apache.org/jira/browse/HIVE-3363
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Anand Balaraman
 Attachments: HIVE-3363.patch


 I am facing an issue while viewing special characters (such as é) using Hive.
 If I view the file in HDFS (using hadoop fs -cat command), it is displayed 
 correctly as ’é’, but when I select the data using Hive, this character alone 
 gets replaced by a question mark.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732024#comment-13732024
 ] 

Brock Noland commented on HIVE-5018:


[~benjamin.jakobus] I really appreciate the work you are doing!  However, we'll 
need you to submit the changes in a slightly different form. The way this 
project works is that people submit patches or diffs which contain only the 
changes. This avoids issues of change integration where two people are working 
the same file at the same time. Here is a quick introduction on how to create a 
patch:

{noformat}
$ git clone https://github.com/apache/hive.git
$ cd hive
$ vim build.properties (make some trivial change)
$ git diff  /tmp/my-jira.patch
{noformat}

For example over on HIVE-3363 there is a file HIVE-3363.patch which was 
generated in such a manner.

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnStatsTask.java, CombineHiveInputFormat.java, CommonJoinOperator.java, 
 CommonJoinTaskDispatcher.java, ConditionalResolverCommonJoin.java, 
 ConditionalResolverSkewJoin.java, Context.java, CorrelationOptimizer.java, 
 DDLSemanticAnalyzer.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 

[jira] [Commented] (HIVE-4948) WriteLockTest and ZNodeNameTest do not follow test naming pattern

2013-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732038#comment-13732038
 ] 

Hudson commented on HIVE-4948:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2249 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2249/])
HIVE-4948: WriteLockTest and ZNodeNameTest do not follow test naming pattern 
(brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1511075)
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/snapshot/lock/TestWriteLock.java
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/snapshot/lock/TestZNodeName.java
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/snapshot/lock/WriteLockTest.java
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/test/org/apache/hcatalog/hbase/snapshot/lock/ZNodeNameTest.java


 WriteLockTest and ZNodeNameTest do not follow test naming pattern
 -

 Key: HIVE-4948
 URL: https://issues.apache.org/jira/browse/HIVE-4948
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4948.patch


 These tests should be renamed TestWriteLock and TestZNodeName
 org.apache.hcatalog.hbase.snapshot.lock.WriteLockTest
 org.apache.hcatalog.hbase.snapshot.lock.ZNodeNameTest

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task

2013-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732039#comment-13732039
 ] 

Hudson commented on HIVE-4870:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2249 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2249/])
HIVE-4870 : Explain Extended to show partition info for Fetch Task (Laljo John 
Pullokkaran via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1511066)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java
* /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_8.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin10.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin11.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin12.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin13.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin8.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join32.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join32_lessSize.q.out
* /hive/trunk/ql/src/test/results/clientpositive/join33.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/stats11.q.out
* /hive/trunk/ql/src/test/results/clientpositive/union22.q.out


 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.12.0

 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4995) select * may incorrectly return empty fields with hbase-handler

2013-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732037#comment-13732037
 ] 

Hudson commented on HIVE-4995:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2249 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2249/])
HIVE-4995: select * may incorrectly return empty fields with hbase-handler 
(Swarnim Kulkarni via Brock Noland) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510973)
* 
/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
* 
/hive/trunk/hbase-handler/src/test/queries/positive/hbase_binary_map_queries_prefix.q
* 
/hive/trunk/hbase-handler/src/test/results/positive/hbase_binary_map_queries_prefix.q.out


 select * may incorrectly return empty fields with hbase-handler
 ---

 Key: HIVE-4995
 URL: https://issues.apache.org/jira/browse/HIVE-4995
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.11.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Fix For: 0.12.0

 Attachments: HIVE-4995.1.patch.txt, HIVE-4995.1.patch.txt


 HIVE-3725 added capability to pull hbase columns with prefixes. However the 
 way the current logic to add columns stands in HiveHBaseTableInput format, it 
 might cause some columns to incorrectly display empty fields.
 Consider the following query:
 {noformat}
 CREATE EXTERNAL TABLE test_table(key string, value1 mapstring,string, 
 value2 string)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES 
 (hbase.columns.mapping = :key,cf-a:prefix.*,cf-a:another_col) 
 TBLPROPERTIES (hbase.table.name = test_table);
 {noformat}
 Given the existing logic in HiveHBaseTableInputFormat:
 {code}
 for (int i = 0; i  columnsMapping.size(); i++) 
 {
 ColumnMapping colMap = columnsMapping.get(i);
 if (colMap.hbaseRowKey) {
   continue;
 }
 if (colMap.qualifierName == null) {
   scan.addFamily(colMap.familyNameBytes);
 } else {
   scan.addColumn(colMap.familyNameBytes, colMap.qualifierNameBytes);
 }
 }
 {code}
 So for the above query, the 'addFamily' will be called first followed by 
 'addColumn' for the column family cf-a. This will wipe away whatever we had 
 set with the 'addFamily' call in the previous step resulting in an empty 
 column when queried.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5011) Dynamic partitioning in HCatalog broken on external tables

2013-08-07 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5011:
---

Attachment: HIVE-5011.patch

Attaching patch.

 Dynamic partitioning in HCatalog broken on external tables
 --

 Key: HIVE-5011
 URL: https://issues.apache.org/jira/browse/HIVE-5011
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5011.patch


 Dynamic partitioning with HCatalog has been broken as a result of 
 HCATALOG-500 trying to support user-set paths for external tables.
 The goal there was to be able to support other custom destinations apart from 
 the normal hive-style partitions. However, it is not currently possible for 
 users to set paths for dynamic ptn writes, since we don't support any way for 
 users to specify patterns(like, say $\{rootdir\}/$v1.$v2/) into which 
 writes happen, only locations, and the values for dyn. partitions are not 
 known ahead of time. Also, specifying a custom path messes with the way 
 dynamic ptn. code tries to determine what was written to where from the 
 output committer, which means that even if we supported patterned-writes 
 instead of location-writes, we still have to do some more deep diving into 
 the output committer code to support it.
 Thus, my current proposal is that we honour writes to user-specified paths 
 for external tables *ONLY* for static partition writes - i.e., if we can 
 determine that the write is a dyn. ptn. write, we will ignore the user 
 specification. (Note that this does not mean we ignore the table's external 
 location - we honour that - we just don't honour any HCatStorer/etc provided 
 additional location - we stick to what metadata tells us the root location is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4911) Enable QOP configuration for Hive Server 2 thrift transport

2013-08-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732118#comment-13732118
 ] 

Ashutosh Chauhan commented on HIVE-4911:


+1 LGTM

 Enable QOP configuration for Hive Server 2 thrift transport
 ---

 Key: HIVE-4911
 URL: https://issues.apache.org/jira/browse/HIVE-4911
 Project: Hive
  Issue Type: New Feature
Reporter: Arup Malakar
Assignee: Arup Malakar
 Attachments: 20-build-temp-change-1.patch, 
 20-build-temp-change.patch, HIVE-4911-trunk-0.patch, HIVE-4911-trunk-1.patch, 
 HIVE-4911-trunk-2.patch, HIVE-4911-trunk-3.patch


 The QoP for hive server 2 should be configurable to enable encryption. A new 
 configuration should be exposed hive.server2.thrift.rpc.protection. This 
 would give greater control configuring hive server 2 service.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4992) add ability to skip javadoc during build

2013-08-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4992.


   Resolution: Fixed
Fix Version/s: 0.12.0

Committed to trunk. Thanks, Sergey!

 add ability to skip javadoc during build
 

 Key: HIVE-4992
 URL: https://issues.apache.org/jira/browse/HIVE-4992
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-4992.D11967.1.patch, HIVE-4992.D11967.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4987) Javadoc can generate argument list too long error

2013-08-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4987:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Brock!

 Javadoc can generate argument list too long error
 -

 Key: HIVE-4987
 URL: https://issues.apache.org/jira/browse/HIVE-4987
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4987.patch


 We just to add to useexternalfile=yes to the javadoc statements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4967) Don't serialize unnecessary fields in query plan

2013-08-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4967:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Brock for review!

 Don't serialize unnecessary fields in query plan
 

 Key: HIVE-4967
 URL: https://issues.apache.org/jira/browse/HIVE-4967
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.12.0

 Attachments: HIVE-4967.1.patch, HIVE-4967.patch


 There are quite a few fields which need not to be serialized since they are 
 initialized anyways in backend. We need not to serialize them in our plan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5011) Dynamic partitioning in HCatalog broken on external tables

2013-08-07 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5011:
---

Status: Patch Available  (was: Open)

 Dynamic partitioning in HCatalog broken on external tables
 --

 Key: HIVE-5011
 URL: https://issues.apache.org/jira/browse/HIVE-5011
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Critical
 Attachments: HIVE-5011.patch


 Dynamic partitioning with HCatalog has been broken as a result of 
 HCATALOG-500 trying to support user-set paths for external tables.
 The goal there was to be able to support other custom destinations apart from 
 the normal hive-style partitions. However, it is not currently possible for 
 users to set paths for dynamic ptn writes, since we don't support any way for 
 users to specify patterns(like, say $\{rootdir\}/$v1.$v2/) into which 
 writes happen, only locations, and the values for dyn. partitions are not 
 known ahead of time. Also, specifying a custom path messes with the way 
 dynamic ptn. code tries to determine what was written to where from the 
 output committer, which means that even if we supported patterned-writes 
 instead of location-writes, we still have to do some more deep diving into 
 the output committer code to support it.
 Thus, my current proposal is that we honour writes to user-specified paths 
 for external tables *ONLY* for static partition writes - i.e., if we can 
 determine that the write is a dyn. ptn. write, we will ignore the user 
 specification. (Note that this does not mean we ignore the table's external 
 location - we honour that - we just don't honour any HCatStorer/etc provided 
 additional location - we stick to what metadata tells us the root location is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5011) Dynamic partitioning in HCatalog broken on external tables

2013-08-07 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732144#comment-13732144
 ] 

Sushanth Sowmyan commented on HIVE-5011:


RB link : https://reviews.facebook.net/D12039

 Dynamic partitioning in HCatalog broken on external tables
 --

 Key: HIVE-5011
 URL: https://issues.apache.org/jira/browse/HIVE-5011
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5011.patch


 Dynamic partitioning with HCatalog has been broken as a result of 
 HCATALOG-500 trying to support user-set paths for external tables.
 The goal there was to be able to support other custom destinations apart from 
 the normal hive-style partitions. However, it is not currently possible for 
 users to set paths for dynamic ptn writes, since we don't support any way for 
 users to specify patterns(like, say $\{rootdir\}/$v1.$v2/) into which 
 writes happen, only locations, and the values for dyn. partitions are not 
 known ahead of time. Also, specifying a custom path messes with the way 
 dynamic ptn. code tries to determine what was written to where from the 
 output committer, which means that even if we supported patterned-writes 
 instead of location-writes, we still have to do some more deep diving into 
 the output committer code to support it.
 Thus, my current proposal is that we honour writes to user-specified paths 
 for external tables *ONLY* for static partition writes - i.e., if we can 
 determine that the write is a dyn. ptn. write, we will ignore the user 
 specification. (Note that this does not mean we ignore the table's external 
 location - we honour that - we just don't honour any HCatStorer/etc provided 
 additional location - we stick to what metadata tells us the root location is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5011) Dynamic partitioning in HCatalog broken on external tables

2013-08-07 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5011:
---

Priority: Critical  (was: Major)

 Dynamic partitioning in HCatalog broken on external tables
 --

 Key: HIVE-5011
 URL: https://issues.apache.org/jira/browse/HIVE-5011
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Critical
 Attachments: HIVE-5011.patch


 Dynamic partitioning with HCatalog has been broken as a result of 
 HCATALOG-500 trying to support user-set paths for external tables.
 The goal there was to be able to support other custom destinations apart from 
 the normal hive-style partitions. However, it is not currently possible for 
 users to set paths for dynamic ptn writes, since we don't support any way for 
 users to specify patterns(like, say $\{rootdir\}/$v1.$v2/) into which 
 writes happen, only locations, and the values for dyn. partitions are not 
 known ahead of time. Also, specifying a custom path messes with the way 
 dynamic ptn. code tries to determine what was written to where from the 
 output committer, which means that even if we supported patterned-writes 
 instead of location-writes, we still have to do some more deep diving into 
 the output committer code to support it.
 Thus, my current proposal is that we honour writes to user-specified paths 
 for external tables *ONLY* for static partition writes - i.e., if we can 
 determine that the write is a dyn. ptn. write, we will ignore the user 
 specification. (Note that this does not mean we ignore the table's external 
 location - we honour that - we just don't honour any HCatStorer/etc provided 
 additional location - we stick to what metadata tells us the root location is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4967) Don't serialize unnecessary fields in query plan

2013-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732150#comment-13732150
 ] 

Hudson commented on HIVE-4967:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #338 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/338/])
HIVE-4967 : Don't serialize unnecessary fields in query plan (Ashutosh Chauhan. 
Reviewed by Brock Noland) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1511377)
* 
/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/genericudf/example/GenericUDFDBOutput.java
* 
/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFExplode2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeGenericFuncDesc.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFEWAHBitmapBop.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFReflect.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovariance.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEWAHBitmap.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLeadLag.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMax.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMin.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFNTile.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArray.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseCompare.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCase.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCoalesce.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFConcatWS.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapEmpty.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFElt.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFField.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFormatNumber.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFromUtcTimestamp.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIf.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIn.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInFile.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInstr.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLocate.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMap.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMapKeys.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMapValues.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNvl.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPAnd.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNot.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPOr.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFPrintf.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFReflect.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFReflect2.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSize.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java
* 

[jira] [Commented] (HIVE-4992) add ability to skip javadoc during build

2013-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732149#comment-13732149
 ] 

Hudson commented on HIVE-4992:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #338 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/338/])
HIVE-4992 : add ability to skip javadoc during build (Sergey Shelukhin via 
Ashutosh h Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1511374)
* /hive/trunk/build.xml
* /hive/trunk/hcatalog/build.xml


 add ability to skip javadoc during build
 

 Key: HIVE-4992
 URL: https://issues.apache.org/jira/browse/HIVE-4992
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-4992.D11967.1.patch, HIVE-4992.D11967.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4987) Javadoc can generate argument list too long error

2013-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732151#comment-13732151
 ] 

Hudson commented on HIVE-4987:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #338 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/338/])
HIVE-4987 : Javadoc can generate argument list too long error (Brock Noland via 
Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1511375)
* /hive/trunk/build.xml
* /hive/trunk/hcatalog/webhcat/svr/build.xml


 Javadoc can generate argument list too long error
 -

 Key: HIVE-4987
 URL: https://issues.apache.org/jira/browse/HIVE-4987
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4987.patch


 We just to add to useexternalfile=yes to the javadoc statements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: DemuxOperator.java
DDLTask.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnStatsTask.java, CombineHiveInputFormat.java, CommonJoinOperator.java, 
 CommonJoinTaskDispatcher.java, ConditionalResolverCommonJoin.java, 
 ConditionalResolverSkewJoin.java, Context.java, CorrelationOptimizer.java, 
 DDLSemanticAnalyzer.java, DDLTask.java, DemuxOperator.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: Driver.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnStatsTask.java, CombineHiveInputFormat.java, CommonJoinOperator.java, 
 CommonJoinTaskDispatcher.java, ConditionalResolverCommonJoin.java, 
 ConditionalResolverSkewJoin.java, Context.java, CorrelationOptimizer.java, 
 DDLSemanticAnalyzer.java, DDLTask.java, DemuxOperator.java, Driver.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java
 java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java
 java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java
 java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java
 

[jira] [Commented] (HIVE-4233) The TGT gotten from class 'CLIService' should be renewed on time

2013-08-07 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732186#comment-13732186
 ] 

Gunther Hagleitner commented on HIVE-4233:
--

The test failure is unrelated. Tests look good for this patch.

 The TGT gotten from class 'CLIService'  should be renewed on time
 -

 Key: HIVE-4233
 URL: https://issues.apache.org/jira/browse/HIVE-4233
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
 Environment: CentOS release 6.3 (Final)
 jdk1.6.0_31
 HiveServer2  0.10.0-cdh4.2.0
 Kerberos Security 
Reporter: Dongyong Wang
Assignee: Thejas M Nair
Priority: Critical
 Attachments: 0001-FIX-HIVE-4233.patch, HIVE-4233-2.patch, 
 HIVE-4233-3.patch, HIVE-4233.4.patch, HIVE-4233.5.patch


 When the HIveServer2 have started more than 7 days, I use beeline  shell  to  
 connect the HiveServer2,all operation failed.
 The log of HiveServer2 shows it was caused by the Kerberos auth failure,the 
 exception stack trace is:
 2013-03-26 11:55:20,932 ERROR hive.ql.metadata.Hive: 
 java.lang.RuntimeException: Unable to instantiate 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1084)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:51)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2140)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2151)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.getDelegationToken(Hive.java:2275)
 at 
 org.apache.hive.service.cli.CLIService.getDelegationTokenFromMetaStore(CLIService.java:358)
 at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.OpenSession(ThriftCLIService.java:127)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1073)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1058)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:565)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown 
 Source)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1082)
 ... 16 more
 Caused by: java.lang.IllegalStateException: This ticket is no longer valid
 at 
 javax.security.auth.kerberos.KerberosTicket.toString(KerberosTicket.java:601)
 at java.lang.String.valueOf(String.java:2826)
 at java.lang.StringBuilder.append(StringBuilder.java:115)
 at 
 sun.security.jgss.krb5.SubjectComber.findAux(SubjectComber.java:120)
 at sun.security.jgss.krb5.SubjectComber.find(SubjectComber.java:41)
 at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:130)
 at 
 sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:328)
 at java.security.AccessController.doPrivileged(Native Method)
 at 
 sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:325)
 at 
 sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:128)
 at 
 sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106)
 at 
 sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172)
 at 
 sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209)
 at 
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195)
 at 
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
 at 
 com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
 at 
 

[jira] [Updated] (HIVE-4586) [HCatalog] WebHCat should return 404 error for undefined resource

2013-08-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4586:
-

Attachment: HIVE-4586-2.patch

HIVE-4586-2.patch resync with trunk and fix unit test failure.

 [HCatalog] WebHCat should return 404 error for undefined resource
 -

 Key: HIVE-4586
 URL: https://issues.apache.org/jira/browse/HIVE-4586
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4586-1.patch, HIVE-4586-2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1511) Hive plan serialization is slow

2013-08-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1511:
---

Attachment: HIVE-1511-wip2.patch

Another checkpoint.

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Ning Zhang
 Attachments: HIVE-1511.patch, HIVE-1511-wip2.patch, 
 HIVE-1511-wip.patch


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4893) [WebHCat] HTTP 500 errors should be mapped to 400 for bad request

2013-08-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved HIVE-4893.
--

Resolution: Duplicate

 [WebHCat] HTTP 500 errors should be mapped to 400 for bad request
 -

 Key: HIVE-4893
 URL: https://issues.apache.org/jira/browse/HIVE-4893
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4893-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4531) [WebHCat] Collecting task logs to hdfs

2013-08-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4531:
-

Attachment: (was: HIVE-4531-6.patch)

 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, 
 samplestatusdirwithlist.tar.gz


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4531) [WebHCat] Collecting task logs to hdfs

2013-08-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4531:
-

Attachment: HIVE-4531-6.patch

 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Reporter: Daniel Dai
 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, 
 samplestatusdirwithlist.tar.gz


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732233#comment-13732233
 ] 

Benjamin Jakobus commented on HIVE-5018:


Ah, ok. Really sorry - only saw this message now! 

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnStatsTask.java, CombineHiveInputFormat.java, CommonJoinOperator.java, 
 CommonJoinTaskDispatcher.java, ConditionalResolverCommonJoin.java, 
 ConditionalResolverSkewJoin.java, Context.java, CorrelationOptimizer.java, 
 DDLSemanticAnalyzer.java, DDLTask.java, DemuxOperator.java, Driver.java, 
 EmbeddedLockManager.java, ExecDriver.java, ExecReducer.java, EximUtil.java, 
 ExplainTask.java, ExportSemanticAnalyzer.java, FileDump.java, 
 FileSinkOperator.java, FunctionRegistry.java, GenMRFileSink1.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 java/org/apache/hadoop/hive/ql/io/RCFile.java
 

[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Benjamin Jakobus (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Jakobus updated HIVE-5018:
---

Attachment: GenMRFileSink1.java
FunctionRegistry.java
FileSinkOperator.java
FileDump.java
ExportSemanticAnalyzer.java
ExplainTask.java
EximUtil.java
ExecReducer.java
ExecDriver.java
EmbeddedLockManager.java

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnStatsTask.java, CombineHiveInputFormat.java, CommonJoinOperator.java, 
 CommonJoinTaskDispatcher.java, ConditionalResolverCommonJoin.java, 
 ConditionalResolverSkewJoin.java, Context.java, CorrelationOptimizer.java, 
 DDLSemanticAnalyzer.java, DDLTask.java, DemuxOperator.java, Driver.java, 
 EmbeddedLockManager.java, ExecDriver.java, ExecReducer.java, EximUtil.java, 
 ExplainTask.java, ExportSemanticAnalyzer.java, FileDump.java, 
 FileSinkOperator.java, FunctionRegistry.java, GenMRFileSink1.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 

[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-07 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4324:
--

Attachment: HIVE-4324.D12045.1.patch

omalley requested code review of HIVE-4324 [jira] ORC Turn off dictionary 
encoding when number of distinct keys is greater than threshold.

Reviewers: JIRA

forward port of kevin's patch

Add a configurable threshold so that if the number of distinct values in a 
string column is greater than that fraction of non-null values, dictionary 
encoding is turned off.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12045

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  ql/src/test/queries/clientpositive/orc_dictionary_threshold.q
  ql/src/test/resources/orc-file-dump-dictionary-threshold.out
  ql/src/test/results/clientpositive/orc_dictionary_threshold.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/28797/

To: JIRA, omalley


 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-07 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732266#comment-13732266
 ] 

Brock Noland commented on HIVE-5018:


Hi,

No reason to apologize! This is all part of becoming familiar with a project! 
We really appreciate the work you are putting in.

Brock

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: AbstractGenericUDFEWAHBitmapBop.java, 
 AbstractJoinTaskDispatcher.java, AbstractSMBJoinProc.java, 
 BaseSemanticAnalyzer.java, BitmapIndexHandler.java, BlockMergeTask.java, 
 BucketingSortingInferenceOptimizer.java, BucketingSortingOpProcFactory.java, 
 BucketingSortingReduceSinkOptimizer.java, BucketizedHiveInputFormat.java, 
 BucketMapJoinContext.java, ColumnPrunerProcFactory.java, 
 ColumnStatsTask.java, CombineHiveInputFormat.java, CommonJoinOperator.java, 
 CommonJoinTaskDispatcher.java, ConditionalResolverCommonJoin.java, 
 ConditionalResolverSkewJoin.java, Context.java, CorrelationOptimizer.java, 
 DDLSemanticAnalyzer.java, DDLTask.java, DemuxOperator.java, Driver.java, 
 EmbeddedLockManager.java, ExecDriver.java, ExecReducer.java, EximUtil.java, 
 ExplainTask.java, ExportSemanticAnalyzer.java, FileDump.java, 
 FileSinkOperator.java, FunctionRegistry.java, GenMRFileSink1.java


 java/org/apache/hadoop/hive/ql/Context.java
 java/org/apache/hadoop/hive/ql/Driver.java
 java/org/apache/hadoop/hive/ql/QueryPlan.java
 java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
 java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
 java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
 java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
 java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
 java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
 java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
 java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
 java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/MapOperator.java
 java/org/apache/hadoop/hive/ql/exec/MoveTask.java
 java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
 java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java
 java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java
 java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
 java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
 java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
 java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
 java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java
 java/org/apache/hadoop/hive/ql/exec/StatsTask.java
 java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
 java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
 java/org/apache/hadoop/hive/ql/exec/UnionOperator.java
 java/org/apache/hadoop/hive/ql/exec/Utilities.java
 java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java
 java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
 java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
 java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java
 java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
 java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
 java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java
 java/org/apache/hadoop/hive/ql/history/HiveHistory.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
 java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
 java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java
 java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
 java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
 java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
 java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
 java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java
 

[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-07 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732274#comment-13732274
 ] 

Eric Hanson commented on HIVE-4123:
---

This is a great addition. Are you going to update the vectorized reader as well 
to read the updated format?

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4545) HS2 should return describe table results without space padding

2013-08-07 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4545:


Attachment: HIVE-4545.2.patch

HIVE-4545.2.patch - patch rebased to latest trunk

 HS2 should return describe table results without space padding
 --

 Key: HIVE-4545
 URL: https://issues.apache.org/jira/browse/HIVE-4545
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4545-1.patch, HIVE-4545.2.patch


 HIVE-3140 changed behavior of 'DESCRIBE table;' to be like 'DESCRIBE 
 FORMATTED table;'. HIVE-3140 introduced changes to not print header in 
 'DESCRIBE table;'. But jdbc/odbc calls still get fields padded with space for 
 the 'DESCRIBE table;' query.
 As the jdbc/odbc results are not for direct human consumption the space 
 padding should not be done for hive server2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-08-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732293#comment-13732293
 ] 

Ashutosh Chauhan commented on HIVE-4838:


Actually memory monitoring I was talking of was about local task which 
generates hashtable which happens locally on client. To generate a hashtable 
(which is then ship to task nodes) we launch local job on client in separate 
process. Logic of memory management for this local task is convoluted (not of 
MR job which actually does the join in mapper). This local task monitors its 
own memory, but seems like MapredLocalTask is catching OOM exception anyways. 
One of this is not required. My thinking is there shouldn't be any memory 
monitoring and we should just catch OOM exception when it fails. Anyways join 
is converted into mapjoin only when size of small table is small (governed by 
config knob), so this OOM should be very very rare. So, my suggestion is to 
remove MemoryHandler altogether.

ORC memory manger won't be a problem here, since ORC makes use of memory 
manager only while writing data and here we are dumping hashtable in java 
serialized format, so that wont be relevant. For similar reason (that this is 
local task) java.opts and io.sort.mb arent relevant either. 

 Refactor MapJoin HashMap code to improve testability and readability
 

 Key: HIVE-4838
 URL: https://issues.apache.org/jira/browse/HIVE-4838
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, 
 HIVE-4838.patch, HIVE-4838.patch


 MapJoin is an essential component for high performance joins in Hive and the 
 current code has done great service for many years. However, the code is 
 showing it's age and currently suffers  from the following issues:
 * Uses static state via the MapJoinMetaData class to pass serialization 
 metadata to the Key, Row classes.
 * The api of a logical Table Container is not defined and therefore it's 
 unclear what apis HashMapWrapper 
 needs to publicize. Additionally HashMapWrapper has many used public methods.
 * HashMapWrapper contains logic to serialize, test memory bounds, and 
 implement the table container. Ideally these logical units could be seperated
 * HashTableSinkObjectCtx has unused fields and unused methods
 * CommonJoinOperator and children use ArrayList on left hand side when only 
 List is required
 * There are unused classes MRU, DCLLItemm and classes which duplicate 
 functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4545) HS2 should return describe table results without space padding

2013-08-07 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4545:


Status: Patch Available  (was: Open)

 HS2 should return describe table results without space padding
 --

 Key: HIVE-4545
 URL: https://issues.apache.org/jira/browse/HIVE-4545
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4545-1.patch, HIVE-4545.2.patch


 HIVE-3140 changed behavior of 'DESCRIBE table;' to be like 'DESCRIBE 
 FORMATTED table;'. HIVE-3140 introduced changes to not print header in 
 'DESCRIBE table;'. But jdbc/odbc calls still get fields padded with space for 
 the 'DESCRIBE table;' query.
 As the jdbc/odbc results are not for direct human consumption the space 
 padding should not be done for hive server2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-08-07 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732571#comment-13732571
 ] 

Brock Noland commented on HIVE-4838:


What I was saying is the the local task JVM could be of different size than the 
mapred.child.java.opts on the server. I haven't heard of people hitting this 
much so it must not be too much of an issue. Good to know the ORC stuff is only 
used on write so it won't be an issue.

I am fine with removing the memory handling and using OOM. I think that I will 
allocate a buffer of say 1MB and then when the OOM is hit free that buffer so 
we can cleanly exit and log.

 Refactor MapJoin HashMap code to improve testability and readability
 

 Key: HIVE-4838
 URL: https://issues.apache.org/jira/browse/HIVE-4838
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, 
 HIVE-4838.patch, HIVE-4838.patch


 MapJoin is an essential component for high performance joins in Hive and the 
 current code has done great service for many years. However, the code is 
 showing it's age and currently suffers  from the following issues:
 * Uses static state via the MapJoinMetaData class to pass serialization 
 metadata to the Key, Row classes.
 * The api of a logical Table Container is not defined and therefore it's 
 unclear what apis HashMapWrapper 
 needs to publicize. Additionally HashMapWrapper has many used public methods.
 * HashMapWrapper contains logic to serialize, test memory bounds, and 
 implement the table container. Ideally these logical units could be seperated
 * HashTableSinkObjectCtx has unused fields and unused methods
 * CommonJoinOperator and children use ArrayList on left hand side when only 
 List is required
 * There are unused classes MRU, DCLLItemm and classes which duplicate 
 functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-07 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732578#comment-13732578
 ] 

Prasanth J commented on HIVE-4123:
--

[~ehans]Sure. I can take a look at changes required for vectorized reader to 
read from this new encodings.  

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4545) HS2 should return describe table results without space padding

2013-08-07 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4545:


Status: Open  (was: Patch Available)

 HS2 should return describe table results without space padding
 --

 Key: HIVE-4545
 URL: https://issues.apache.org/jira/browse/HIVE-4545
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4545-1.patch, HIVE-4545.2.patch


 HIVE-3140 changed behavior of 'DESCRIBE table;' to be like 'DESCRIBE 
 FORMATTED table;'. HIVE-3140 introduced changes to not print header in 
 'DESCRIBE table;'. But jdbc/odbc calls still get fields padded with space for 
 the 'DESCRIBE table;' query.
 As the jdbc/odbc results are not for direct human consumption the space 
 padding should not be done for hive server2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Discuss] project chop up

2013-08-07 Thread Brock Noland
Thus far there hasn't been any dissent to managing our modules with maven.
 In addition there have been several comments positive on a move towards
maven. I'd like to add Ivy seems to have issues managing multiple versions
of libraries. For example in HIVE-3632 Ivy cache had to be cleared when
testing patches that installed the new version of DataNucleus  I have had
the same issue on HIVE-4388. Requiring the deletion of the ivy cache
is extremely painful for developers that don't have access to high
bandwidth connections or live in areas far from California where most of
these jars are hosted.

I'd like to propose we move towards Maven.


On Sat, Jul 27, 2013 at 1:19 PM, Mohammad Islam misla...@yahoo.com wrote:



 Yes hive build and test cases got convoluted as the project scope
 gradually increased. This is the time to take action!

 Based on my other Apache experiences, I prefer the option #3 Breakup the
 projects within our own source tree. Make multiple modules or
 sub-projects. By default, only key modules will be built.

 Maven could be a possible candidate.

 Regards,
 Mohammad



 
  From: Edward Capriolo edlinuxg...@gmail.com
 To: dev@hive.apache.org dev@hive.apache.org
 Sent: Saturday, July 27, 2013 7:03 AM
 Subject: Re: [Discuss] project chop up


 Or feel free to suggest different approach. I am used to managing software
 as multi-module maven projects.
 From a development standpoint if I was working on beeline, it would be nice
 to only require some of the sub-projects to be open in my IDE to do that.
 Also managing everything globally is not ideal.

 Hive's project layout, build, and test infrastructure is just funky. It has
 to do a few interesting things (shims, testing), but I do not think what we
 are doing justifies the massive ant build system we have. Ant is so ten
 years ago.



 On Sat, Jul 27, 2013 at 12:04 AM, Alan Gates ga...@hortonworks.com
 wrote:

  But I assume they'd still be a part of targets like package, tar, and
  binary?  Making them compile and test separately and explicitly load the
  core Hive jars from maven/ivy seems reasonable.
 
  Alan.
 
  On Jul 26, 2013, at 8:40 PM, Brock Noland wrote:
 
   Hi,
  
   I think thats part of it but I'd like to decouple the downstream
 projects
   even further so that the only connection is the dependency on the hive
  jars.
  
   Brock
   On Jul 26, 2013 10:10 PM, Alan Gates ga...@hortonworks.com wrote:
  
   I'm not sure how this is different from what hcat does today.  It
 needs
   Hive's jars to compile, so it's one of the last things in the compile
  step.
   Would moving the other modules you note to be in the same category be
   enough?  Did you want to also make it so that the default ant target
   doesn't compile those?
  
   Alan.
  
   On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote:
  
   My mistake on saying hcat was a fork metastore. I had a brain fart
 for
  a
   moment.
  
   One way we could do this is create a folder called downstream. In our
   release step we can execute the downstream builds and then copy the
  files
   we need back. So nothing downstream will be on the classpath of the
  main
   project.
  
   This could help us breakup ql as well. Things like exotic file
 formats
  ,
   and things that are pluggable like zk locking can go here. That might
  be
   overkill.
  
   For now we can focus on building downstream and hivethrift1might be
 the
   first thing to try to downstream.
  
  
   On Friday, July 26, 2013, Thejas Nair the...@hortonworks.com
 wrote:
   +1 to the idea of making the build of core hive and other downstream
   components independent.
  
   bq.  I was under the impression that Hcat and hive-metastore was
   supposed to merge up somehow.
  
   The metastore code was never forked. Hcat was just using
   hive-metastore and making the metadata available to rest of hadoop
   (pig, java MR..).
   A lot of the changes that were driven by hcat goals were being made
 in
   hive-metastore. You can think of hcat as set of libraries that let
 pig
   and java MR use hive metastore. Since hcat is closely tied to
   hive-metastore, it makes sense to have them in same project.
  
  
   On Fri, Jul 26, 2013 at 6:33 AM, Edward Capriolo 
  edlinuxg...@gmail.com
  
   wrote:
   Also i believe hcatalog web can fall into the same designation.
  
   Question , hcatalog was initily a big hive-metastore fork. I was
  under
   the
   impression that Hcat and hive-metastore was supposed to merge up
   somehow.
   What is the status on that? I remember that was one of the core
  reasons
   we
   brought it in.
  
   On Friday, July 26, 2013, Edward Capriolo edlinuxg...@gmail.com
   wrote:
   I prefer option 3 as well.
  
  
   On Fri, Jul 26, 2013 at 12:52 AM, Brock Noland 
 br...@cloudera.com
   wrote:
  
   On Thu, Jul 25, 2013 at 9:48 PM, Edward Capriolo 
   edlinuxg...@gmail.com
   wrote:
  
   I have been developing my laptop on a duel core 2 GB Ram laptop
  for
   

[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732595#comment-13732595
 ] 

Phabricator commented on HIVE-4324:
---

ashutoshc has requested changes to the revision HIVE-4324 [jira] ORC Turn off 
dictionary encoding when number of distinct keys is greater than threshold.

  Mostly looks good, except for some minor nits.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java:249 Is it better 
to modify clear to accept compress and suppress arguments ?
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java:768 Good 
to add a javadoc saying this Reader reads strings which doesn't have 
accompanying dictionary.
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java:838 
Similarly here, javadoc of effect : This reader reads dictionary encoded 
strings.
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java:166 This 
method could be package private?

REVISION DETAIL
  https://reviews.facebook.net/D12045

BRANCH
  h-4324

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, omalley


 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-08-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732597#comment-13732597
 ] 

Ashutosh Chauhan commented on HIVE-4838:


bq. I am fine with removing the memory handling and using OOM. I think that I 
will allocate a buffer of say 1MB and then when the OOM is hit free that buffer 
so we can cleanly exit and log.

Sounds good. Lets proceed with that. Though, I belief 256KB should be more than 
sufficient to generate exception and cleanly exit.

 Refactor MapJoin HashMap code to improve testability and readability
 

 Key: HIVE-4838
 URL: https://issues.apache.org/jira/browse/HIVE-4838
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, 
 HIVE-4838.patch, HIVE-4838.patch


 MapJoin is an essential component for high performance joins in Hive and the 
 current code has done great service for many years. However, the code is 
 showing it's age and currently suffers  from the following issues:
 * Uses static state via the MapJoinMetaData class to pass serialization 
 metadata to the Key, Row classes.
 * The api of a logical Table Container is not defined and therefore it's 
 unclear what apis HashMapWrapper 
 needs to publicize. Additionally HashMapWrapper has many used public methods.
 * HashMapWrapper contains logic to serialize, test memory bounds, and 
 implement the table container. Ideally these logical units could be seperated
 * HashTableSinkObjectCtx has unused fields and unused methods
 * CommonJoinOperator and children use ArrayList on left hand side when only 
 List is required
 * There are unused classes MRU, DCLLItemm and classes which duplicate 
 functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4545) HS2 should return describe table results without space padding

2013-08-07 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4545:


Attachment: HIVE-4545.3.patch

HIVE-4545.3.patch - updates test case to remove .trim() before comparison in 
two more places.


 HS2 should return describe table results without space padding
 --

 Key: HIVE-4545
 URL: https://issues.apache.org/jira/browse/HIVE-4545
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4545-1.patch, HIVE-4545.2.patch, HIVE-4545.3.patch


 HIVE-3140 changed behavior of 'DESCRIBE table;' to be like 'DESCRIBE 
 FORMATTED table;'. HIVE-3140 introduced changes to not print header in 
 'DESCRIBE table;'. But jdbc/odbc calls still get fields padded with space for 
 the 'DESCRIBE table;' query.
 As the jdbc/odbc results are not for direct human consumption the space 
 padding should not be done for hive server2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request 13383: HIVE-4545 - HS2 should return describe table results without space padding

2013-08-07 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13383/
---

Review request for hive.


Repository: hive-git


Description
---

HIVE-3140 changed behavior of 'DESCRIBE table;' to be like 'DESCRIBE FORMATTED 
table;'. HIVE-3140 introduced changes to not print header in 'DESCRIBE table;'. 
But jdbc/odbc calls still get fields padded with space for the 'DESCRIBE 
table;' query.

As the jdbc/odbc results are not for direct human consumption the space padding 
should not be done for hive server2.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 83f337b 
  jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java f35a351 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 4dcb260 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/JsonMetaDataFormatter.java
 a85a19d 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java
 0d71891 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatter.java
 4c40034 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java
 0f48674 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
7254491 

Diff: https://reviews.apache.org/r/13383/diff/


Testing
---

Updated TestJdbcDriver2 unit tests


Thanks,

Thejas Nair



Re: Review Request 13383: HIVE-4545 - HS2 should return describe table results without space padding

2013-08-07 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13383/
---

(Updated Aug. 7, 2013, 7:18 p.m.)


Review request for hive.


Changes
---

HIVE-4545.3.patch - updates test case to remove .trim() before comparison in 
two more places.


Bugs: HIVE-4545
https://issues.apache.org/jira/browse/HIVE-4545


Repository: hive-git


Description
---

HIVE-3140 changed behavior of 'DESCRIBE table;' to be like 'DESCRIBE FORMATTED 
table;'. HIVE-3140 introduced changes to not print header in 'DESCRIBE table;'. 
But jdbc/odbc calls still get fields padded with space for the 'DESCRIBE 
table;' query.

As the jdbc/odbc results are not for direct human consumption the space padding 
should not be done for hive server2.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 83f337b 
  jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2.java f35a351 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 4dcb260 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/JsonMetaDataFormatter.java
 a85a19d 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatUtils.java
 0d71891 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/MetaDataFormatter.java
 4c40034 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java
 0f48674 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
7254491 

Diff: https://reviews.apache.org/r/13383/diff/


Testing
---

Updated TestJdbcDriver2 unit tests


Thanks,

Thejas Nair



[jira] [Updated] (HIVE-4545) HS2 should return describe table results without space padding

2013-08-07 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4545:


Status: Patch Available  (was: Open)

Review board link - https://reviews.apache.org/r/13383/

 HS2 should return describe table results without space padding
 --

 Key: HIVE-4545
 URL: https://issues.apache.org/jira/browse/HIVE-4545
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4545-1.patch, HIVE-4545.2.patch, HIVE-4545.3.patch


 HIVE-3140 changed behavior of 'DESCRIBE table;' to be like 'DESCRIBE 
 FORMATTED table;'. HIVE-3140 introduced changes to not print header in 
 'DESCRIBE table;'. But jdbc/odbc calls still get fields padded with space for 
 the 'DESCRIBE table;' query.
 As the jdbc/odbc results are not for direct human consumption the space 
 padding should not be done for hive server2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type

2013-08-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732616#comment-13732616
 ] 

Ashutosh Chauhan commented on HIVE-2482:


Hey [~mwagner] I have couple of minor comments. Can you create a RB or 
phabricator entry for the patch?

 Convenience UDFs for binary data type
 -

 Key: HIVE-2482
 URL: https://issues.apache.org/jira/browse/HIVE-2482
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.9.0
Reporter: Ashutosh Chauhan
Assignee: Mark Wagner
 Attachments: HIVE-2482.1.patch, HIVE-2482.2.patch


 HIVE-2380 introduced binary data type in Hive. It will be good to have 
 following udfs to make it more useful:
 * UDF's to convert to/from hex string
 * UDF's to convert to/from string using a specific encoding
 * UDF's to convert to/from base64 string
 * UDF's to convert to/from non-string types using a particular serde

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4911) Enable QOP configuration for Hive Server 2 thrift transport

2013-08-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732623#comment-13732623
 ] 

Ashutosh Chauhan commented on HIVE-4911:


[~amalakar] HIVE-4911-trunk-3.patch is the patch in entirety. We dont need 
anything else, right ?

 Enable QOP configuration for Hive Server 2 thrift transport
 ---

 Key: HIVE-4911
 URL: https://issues.apache.org/jira/browse/HIVE-4911
 Project: Hive
  Issue Type: New Feature
Reporter: Arup Malakar
Assignee: Arup Malakar
 Attachments: 20-build-temp-change-1.patch, 
 20-build-temp-change.patch, HIVE-4911-trunk-0.patch, HIVE-4911-trunk-1.patch, 
 HIVE-4911-trunk-2.patch, HIVE-4911-trunk-3.patch


 The QoP for hive server 2 should be configurable to enable encryption. A new 
 configuration should be exposed hive.server2.thrift.rpc.protection. This 
 would give greater control configuring hive server 2 service.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4789) FetchOperator fails on partitioned Avro data

2013-08-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732634#comment-13732634
 ] 

Ashutosh Chauhan commented on HIVE-4789:


Your changes in MetaStoreUtils are indeed reasonable. I just wanted to make 
sure whether they are really needed. If you can come up with a testcase, which 
shows the failure without changes in MetaStoreUtils, that will make it easier 
to concretize why these changes are useful.

 FetchOperator fails on partitioned Avro data
 

 Key: HIVE-4789
 URL: https://issues.apache.org/jira/browse/HIVE-4789
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.12.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt


 HIVE-3953 fixed using partitioned avro tables for anything that used the 
 MapOperator, but those that rely on FetchOperator still fail with the same 
 error.
 e.g.
 {code}
   SELECT * FROM partitioned_avro LIMIT 5;
   SELECT * FROM partitioned_avro WHERE partition_col=value;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Discuss] project chop up

2013-08-07 Thread kulkarni.swar...@gmail.com
 I'd like to propose we move towards Maven.

Big +1 on this. Most of the major apache projects(hadoop, hbase, avro etc.)
are maven based.

Also can't agree more that the current build system is frustrating to say
the least. Another issue I had with the existing ant based system is that
there are no checkpointing capabilities[1]. So if a 6 hour build fails
after 5hr 30 minutes, most of the things even though successful have to be
rebuilt which is very time consuming. Maven reactors have inbuilt support
for lot of this stuff.

[1] https://issues.apache.org/jira/browse/HIVE-3449.


On Wed, Aug 7, 2013 at 2:06 PM, Brock Noland br...@cloudera.com wrote:

 Thus far there hasn't been any dissent to managing our modules with maven.
  In addition there have been several comments positive on a move towards
 maven. I'd like to add Ivy seems to have issues managing multiple versions
 of libraries. For example in HIVE-3632 Ivy cache had to be cleared when
 testing patches that installed the new version of DataNucleus  I have had
 the same issue on HIVE-4388. Requiring the deletion of the ivy cache
 is extremely painful for developers that don't have access to high
 bandwidth connections or live in areas far from California where most of
 these jars are hosted.

 I'd like to propose we move towards Maven.


 On Sat, Jul 27, 2013 at 1:19 PM, Mohammad Islam misla...@yahoo.com
 wrote:

 
 
  Yes hive build and test cases got convoluted as the project scope
  gradually increased. This is the time to take action!
 
  Based on my other Apache experiences, I prefer the option #3 Breakup the
  projects within our own source tree. Make multiple modules or
  sub-projects. By default, only key modules will be built.
 
  Maven could be a possible candidate.
 
  Regards,
  Mohammad
 
 
 
  
   From: Edward Capriolo edlinuxg...@gmail.com
  To: dev@hive.apache.org dev@hive.apache.org
  Sent: Saturday, July 27, 2013 7:03 AM
  Subject: Re: [Discuss] project chop up
 
 
  Or feel free to suggest different approach. I am used to managing
 software
  as multi-module maven projects.
  From a development standpoint if I was working on beeline, it would be
 nice
  to only require some of the sub-projects to be open in my IDE to do that.
  Also managing everything globally is not ideal.
 
  Hive's project layout, build, and test infrastructure is just funky. It
 has
  to do a few interesting things (shims, testing), but I do not think what
 we
  are doing justifies the massive ant build system we have. Ant is so ten
  years ago.
 
 
 
  On Sat, Jul 27, 2013 at 12:04 AM, Alan Gates ga...@hortonworks.com
  wrote:
 
   But I assume they'd still be a part of targets like package, tar, and
   binary?  Making them compile and test separately and explicitly load
 the
   core Hive jars from maven/ivy seems reasonable.
  
   Alan.
  
   On Jul 26, 2013, at 8:40 PM, Brock Noland wrote:
  
Hi,
   
I think thats part of it but I'd like to decouple the downstream
  projects
even further so that the only connection is the dependency on the
 hive
   jars.
   
Brock
On Jul 26, 2013 10:10 PM, Alan Gates ga...@hortonworks.com
 wrote:
   
I'm not sure how this is different from what hcat does today.  It
  needs
Hive's jars to compile, so it's one of the last things in the
 compile
   step.
Would moving the other modules you note to be in the same category
 be
enough?  Did you want to also make it so that the default ant target
doesn't compile those?
   
Alan.
   
On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote:
   
My mistake on saying hcat was a fork metastore. I had a brain fart
  for
   a
moment.
   
One way we could do this is create a folder called downstream. In
 our
release step we can execute the downstream builds and then copy the
   files
we need back. So nothing downstream will be on the classpath of the
   main
project.
   
This could help us breakup ql as well. Things like exotic file
  formats
   ,
and things that are pluggable like zk locking can go here. That
 might
   be
overkill.
   
For now we can focus on building downstream and hivethrift1might be
  the
first thing to try to downstream.
   
   
On Friday, July 26, 2013, Thejas Nair the...@hortonworks.com
  wrote:
+1 to the idea of making the build of core hive and other
 downstream
components independent.
   
bq.  I was under the impression that Hcat and hive-metastore was
supposed to merge up somehow.
   
The metastore code was never forked. Hcat was just using
hive-metastore and making the metadata available to rest of hadoop
(pig, java MR..).
A lot of the changes that were driven by hcat goals were being
 made
  in
hive-metastore. You can think of hcat as set of libraries that let
  pig
and java MR use hive metastore. Since hcat is closely tied to
hive-metastore, it makes sense to have them in same project.
   
   
 

[jira] [Commented] (HIVE-4990) ORC seeks fails with non-zero offset or column projection

2013-08-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732655#comment-13732655
 ] 

Phabricator commented on HIVE-4990:
---

ashutoshc has accepted the revision HIVE-4990 [jira] ORC seeks fails with 
non-zero offset or column projection.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D12009

BRANCH
  trunk

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, omalley


 ORC seeks fails with non-zero offset or column projection
 -

 Key: HIVE-4990
 URL: https://issues.apache.org/jira/browse/HIVE-4990
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.11.1

 Attachments: HIVE-4990.D12009.1.patch


 The ORC reader gets exceptions when seeking with non-zero offsets or column 
 projection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Discuss] project chop up

2013-08-07 Thread Edward Capriolo
I think that is a good idea. I have been thinking about it a lot. I
especially hate how the offline build is now broken.

However I think it is going to take some time. There are some tricks like
how we build hive-exec jar that are not very clean to do in maven. I am
very interested

The last initiative we spoke about on list was moving from forest, I would
like to finish/start that before we get onto the project chop up.


On Wed, Aug 7, 2013 at 3:06 PM, Brock Noland br...@cloudera.com wrote:

 Thus far there hasn't been any dissent to managing our modules with maven.
  In addition there have been several comments positive on a move towards
 maven. I'd like to add Ivy seems to have issues managing multiple versions
 of libraries. For example in HIVE-3632 Ivy cache had to be cleared when
 testing patches that installed the new version of DataNucleus  I have had
 the same issue on HIVE-4388. Requiring the deletion of the ivy cache
 is extremely painful for developers that don't have access to high
 bandwidth connections or live in areas far from California where most of
 these jars are hosted.

 I'd like to propose we move towards Maven.


 On Sat, Jul 27, 2013 at 1:19 PM, Mohammad Islam misla...@yahoo.com
 wrote:

 
 
  Yes hive build and test cases got convoluted as the project scope
  gradually increased. This is the time to take action!
 
  Based on my other Apache experiences, I prefer the option #3 Breakup the
  projects within our own source tree. Make multiple modules or
  sub-projects. By default, only key modules will be built.
 
  Maven could be a possible candidate.
 
  Regards,
  Mohammad
 
 
 
  
   From: Edward Capriolo edlinuxg...@gmail.com
  To: dev@hive.apache.org dev@hive.apache.org
  Sent: Saturday, July 27, 2013 7:03 AM
  Subject: Re: [Discuss] project chop up
 
 
  Or feel free to suggest different approach. I am used to managing
 software
  as multi-module maven projects.
  From a development standpoint if I was working on beeline, it would be
 nice
  to only require some of the sub-projects to be open in my IDE to do that.
  Also managing everything globally is not ideal.
 
  Hive's project layout, build, and test infrastructure is just funky. It
 has
  to do a few interesting things (shims, testing), but I do not think what
 we
  are doing justifies the massive ant build system we have. Ant is so ten
  years ago.
 
 
 
  On Sat, Jul 27, 2013 at 12:04 AM, Alan Gates ga...@hortonworks.com
  wrote:
 
   But I assume they'd still be a part of targets like package, tar, and
   binary?  Making them compile and test separately and explicitly load
 the
   core Hive jars from maven/ivy seems reasonable.
  
   Alan.
  
   On Jul 26, 2013, at 8:40 PM, Brock Noland wrote:
  
Hi,
   
I think thats part of it but I'd like to decouple the downstream
  projects
even further so that the only connection is the dependency on the
 hive
   jars.
   
Brock
On Jul 26, 2013 10:10 PM, Alan Gates ga...@hortonworks.com
 wrote:
   
I'm not sure how this is different from what hcat does today.  It
  needs
Hive's jars to compile, so it's one of the last things in the
 compile
   step.
Would moving the other modules you note to be in the same category
 be
enough?  Did you want to also make it so that the default ant target
doesn't compile those?
   
Alan.
   
On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote:
   
My mistake on saying hcat was a fork metastore. I had a brain fart
  for
   a
moment.
   
One way we could do this is create a folder called downstream. In
 our
release step we can execute the downstream builds and then copy the
   files
we need back. So nothing downstream will be on the classpath of the
   main
project.
   
This could help us breakup ql as well. Things like exotic file
  formats
   ,
and things that are pluggable like zk locking can go here. That
 might
   be
overkill.
   
For now we can focus on building downstream and hivethrift1might be
  the
first thing to try to downstream.
   
   
On Friday, July 26, 2013, Thejas Nair the...@hortonworks.com
  wrote:
+1 to the idea of making the build of core hive and other
 downstream
components independent.
   
bq.  I was under the impression that Hcat and hive-metastore was
supposed to merge up somehow.
   
The metastore code was never forked. Hcat was just using
hive-metastore and making the metadata available to rest of hadoop
(pig, java MR..).
A lot of the changes that were driven by hcat goals were being
 made
  in
hive-metastore. You can think of hcat as set of libraries that let
  pig
and java MR use hive metastore. Since hcat is closely tied to
hive-metastore, it makes sense to have them in same project.
   
   
On Fri, Jul 26, 2013 at 6:33 AM, Edward Capriolo 
   edlinuxg...@gmail.com
   
wrote:
Also i believe hcatalog web can fall into the same 

[jira] [Commented] (HIVE-3619) Hive JDBC driver should return a proper update-count of rows affected by query

2013-08-07 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732658#comment-13732658
 ] 

Konstantin Boudnik commented on HIVE-3619:
--

At least returning {{-1}} in the interim would be good, no?

 Hive JDBC driver should return a proper update-count of rows affected by query
 --

 Key: HIVE-3619
 URL: https://issues.apache.org/jira/browse/HIVE-3619
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.9.0
Reporter: Harsh J
Priority: Minor

 HiveStatement.java currently has an explicit 0 return:
 public int getUpdateCount() throws SQLException { return 0; }
 Ideally we ought to emit the exact number of rows affected by the query 
 statement itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Discuss] project chop up

2013-08-07 Thread Brock Noland
FYI I am still waiting on Infra for the CMS move:
https://issues.apache.org/jira/browse/INFRA-6593


On Wed, Aug 7, 2013 at 2:57 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I think that is a good idea. I have been thinking about it a lot. I
 especially hate how the offline build is now broken.

 However I think it is going to take some time. There are some tricks like
 how we build hive-exec jar that are not very clean to do in maven. I am
 very interested

 The last initiative we spoke about on list was moving from forest, I would
 like to finish/start that before we get onto the project chop up.


 On Wed, Aug 7, 2013 at 3:06 PM, Brock Noland br...@cloudera.com wrote:

  Thus far there hasn't been any dissent to managing our modules with
 maven.
   In addition there have been several comments positive on a move towards
  maven. I'd like to add Ivy seems to have issues managing multiple
 versions
  of libraries. For example in HIVE-3632 Ivy cache had to be cleared when
  testing patches that installed the new version of DataNucleus  I have had
  the same issue on HIVE-4388. Requiring the deletion of the ivy cache
  is extremely painful for developers that don't have access to high
  bandwidth connections or live in areas far from California where most of
  these jars are hosted.
 
  I'd like to propose we move towards Maven.
 
 
  On Sat, Jul 27, 2013 at 1:19 PM, Mohammad Islam misla...@yahoo.com
  wrote:
 
  
  
   Yes hive build and test cases got convoluted as the project scope
   gradually increased. This is the time to take action!
  
   Based on my other Apache experiences, I prefer the option #3 Breakup
 the
   projects within our own source tree. Make multiple modules or
   sub-projects. By default, only key modules will be built.
  
   Maven could be a possible candidate.
  
   Regards,
   Mohammad
  
  
  
   
From: Edward Capriolo edlinuxg...@gmail.com
   To: dev@hive.apache.org dev@hive.apache.org
   Sent: Saturday, July 27, 2013 7:03 AM
   Subject: Re: [Discuss] project chop up
  
  
   Or feel free to suggest different approach. I am used to managing
  software
   as multi-module maven projects.
   From a development standpoint if I was working on beeline, it would be
  nice
   to only require some of the sub-projects to be open in my IDE to do
 that.
   Also managing everything globally is not ideal.
  
   Hive's project layout, build, and test infrastructure is just funky. It
  has
   to do a few interesting things (shims, testing), but I do not think
 what
  we
   are doing justifies the massive ant build system we have. Ant is so ten
   years ago.
  
  
  
   On Sat, Jul 27, 2013 at 12:04 AM, Alan Gates ga...@hortonworks.com
   wrote:
  
But I assume they'd still be a part of targets like package, tar, and
binary?  Making them compile and test separately and explicitly load
  the
core Hive jars from maven/ivy seems reasonable.
   
Alan.
   
On Jul 26, 2013, at 8:40 PM, Brock Noland wrote:
   
 Hi,

 I think thats part of it but I'd like to decouple the downstream
   projects
 even further so that the only connection is the dependency on the
  hive
jars.

 Brock
 On Jul 26, 2013 10:10 PM, Alan Gates ga...@hortonworks.com
  wrote:

 I'm not sure how this is different from what hcat does today.  It
   needs
 Hive's jars to compile, so it's one of the last things in the
  compile
step.
 Would moving the other modules you note to be in the same category
  be
 enough?  Did you want to also make it so that the default ant
 target
 doesn't compile those?

 Alan.

 On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote:

 My mistake on saying hcat was a fork metastore. I had a brain
 fart
   for
a
 moment.

 One way we could do this is create a folder called downstream. In
  our
 release step we can execute the downstream builds and then copy
 the
files
 we need back. So nothing downstream will be on the classpath of
 the
main
 project.

 This could help us breakup ql as well. Things like exotic file
   formats
,
 and things that are pluggable like zk locking can go here. That
  might
be
 overkill.

 For now we can focus on building downstream and hivethrift1might
 be
   the
 first thing to try to downstream.


 On Friday, July 26, 2013, Thejas Nair the...@hortonworks.com
   wrote:
 +1 to the idea of making the build of core hive and other
  downstream
 components independent.

 bq.  I was under the impression that Hcat and hive-metastore was
 supposed to merge up somehow.

 The metastore code was never forked. Hcat was just using
 hive-metastore and making the metadata available to rest of
 hadoop
 (pig, java MR..).
 A lot of the changes that were driven by hcat goals were being
  made
   in
 hive-metastore. You can think of hcat 

[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced

2013-08-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732671#comment-13732671
 ] 

Ashutosh Chauhan commented on HIVE-4964:


[~rhbutani] Are we removing some functionality in this patch or is it just dead 
code removal ? If we are removing some functionality can you outline what are 
you proposing to drop?

 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4911) Enable QOP configuration for Hive Server 2 thrift transport

2013-08-07 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732709#comment-13732709
 ] 

Arup Malakar commented on HIVE-4911:


[~ashutoshc]That is correct. 20-build* patch are temporary patch I used to 
build against 20 until HIVE-4991 is committed. 

 Enable QOP configuration for Hive Server 2 thrift transport
 ---

 Key: HIVE-4911
 URL: https://issues.apache.org/jira/browse/HIVE-4911
 Project: Hive
  Issue Type: New Feature
Reporter: Arup Malakar
Assignee: Arup Malakar
 Attachments: 20-build-temp-change-1.patch, 
 20-build-temp-change.patch, HIVE-4911-trunk-0.patch, HIVE-4911-trunk-1.patch, 
 HIVE-4911-trunk-2.patch, HIVE-4911-trunk-3.patch


 The QoP for hive server 2 should be configurable to enable encryption. A new 
 configuration should be exposed hive.server2.thrift.rpc.protection. This 
 would give greater control configuring hive server 2 service.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5020) HCat reading null-key map entries causes NPE

2013-08-07 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-5020:
--

 Summary: HCat reading null-key map entries causes NPE
 Key: HIVE-5020
 URL: https://issues.apache.org/jira/browse/HIVE-5020
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Currently, if someone has a null key in a map, HCatInputFormat will terminate 
with an NPE while trying to read it.

{noformat}
java.lang.NullPointerException
at java.lang.String.compareTo(String.java:1167)
at java.lang.String.compareTo(String.java:92)
at java.util.TreeMap.put(TreeMap.java:545)
at 
org.apache.hcatalog.data.HCatRecordSerDe.serializeMap(HCatRecordSerDe.java:222)
at 
org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:198)
at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
at 
org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
{noformat}

This is because we use a TreeMap to preserve order of elements in the map when 
reading from the underlying storage/serde.

This problem is easily fixed in a number of ways:

a) Switch to HashMap, which allows null keys. That does not preserve order of 
keys, which should not be important for map fields, but if we desire that, we 
have a solution for that too - LinkedHashMap, which would both retain order and 
allow us to insert null keys into the map.

b) Ignore null keyed entries - check if the field we read is null, and if it 
is, then ignore that item in the record altogether. This way, HCat is robust in 
what it does - it does not terminate with an NPE, and it does not allow null 
keys in maps that might be problematic to layers above us that are not used to 
seeing nulls as keys in maps.

Why do I bring up the second fix? I bring it up because of the way we 
discovered this bug. When reading from an RCFile, we do not notice this bug. If 
the same query that produced the RCFile instead produces an Orcfile, and we try 
reading from it, we see this problem.

RCFile seems to be quietly stripping any null key entries, whereas Orc retains 
them. This is why we didn't notice this problem for a long while, and suddenly, 
now, we are. Now, if we fix our code to allow nulls in map keys through to 
layers above, we expose layers above to this change, which may then cause them 
to break. (Technically, this is stretching the case because we already break 
now if they care) More importantly, though, we have a case now, where the same 
data will be exposed differently if it were stored as orc or if it were stored 
as rcfile. And as a layer that is supposed to make storage invisible to the end 
user, HCat should attempt to provide some consistency in how data behaves to 
the end user.

That said...

There is another important concern at hand here: nulls in map keys might be due 
to bad data(corruption or loading error), and by stripping them, we might be 
silently hiding that from the user. This is an important point that does steer 
me towards the former approach, of passing it on to layers above, and 
standardize on an understanding that null keys in maps are acceptable data that 
layers above us have to handle. After that, it could be taken on as a further 
consistency fix, to fix RCFile so that it allows nulls in map keys.

Having gone through this discussion of standardization, another important 
question is whether or not there is actually a use-case for null keys in maps 
in data. If there isn't, maybe we shouldn't allow writing that in the first 
place, and both orc and rcfile must simply error out to the end user if they 
try to write a  null map key? Well, it is true that it is possible that data 
errors lead to null keys, but it's also possible that the user wants to store a 
mapping for value transformations, and they might have a transformation for 
null as well. In the case I encountered it, they were writing out an 
intermediate table after having read from a sparse table using a custom input 
format that generated an arbitrary number of columns, and were using the map to 
store column name mappings that would eventually be written out to another 
table. That seems a valid use, and we shouldn't prevent users from this sort of 
usage.

Another reason for not allowing null keys from a java perspective is locking 
and concurrency concerns, where locking on a null is a pain, per philosophical 
disagreements between Joshua Block and Doug Lea in the design of HashMap and 
ConcurrentHashMap. However, given that HCatalog reads are happening in a thread 
on a drone where there should be no parallel access of that record, and more 
importantly, this should strictly be used in a read-only kind of usage, we 
should not have to worry about that.

Increasingly, my preference is to change to LinkedHashMaps to allow 

Re: [Discuss] project chop up

2013-08-07 Thread Owen O'Malley
On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:

  I'd like to propose we move towards Maven.

 Big +1 on this. Most of the major apache projects(hadoop, hbase, avro etc.)
 are maven based.


A big +1 from me too. I actually took a pass at it a couple of months ago.
Some of the hard part was that some of the test classes are in the wrong
module that references classes in a later module. Obviously that prevents
any kind of modular build.

As an additional plus to Maven is that Maven includes tools to correct the
project and module dependencies.

-- Owen


[jira] [Updated] (HIVE-5020) HCat reading null-key map entries causes NPE

2013-08-07 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5020:
---

Description: 
Currently, if someone has a null key in a map, HCatInputFormat will terminate 
with an NPE while trying to read it.

{noformat}
java.lang.NullPointerException
at java.lang.String.compareTo(String.java:1167)
at java.lang.String.compareTo(String.java:92)
at java.util.TreeMap.put(TreeMap.java:545)
at 
org.apache.hcatalog.data.HCatRecordSerDe.serializeMap(HCatRecordSerDe.java:222)
at 
org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:198)
at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
at 
org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
{noformat}

This is because we use a TreeMap to preserve order of elements in the map when 
reading from the underlying storage/serde.

This problem is easily fixed in a number of ways:

a) Switch to HashMap, which allows null keys. That does not preserve order of 
keys, which should not be important for map fields, but if we desire that, we 
have a solution for that too - LinkedHashMap, which would both retain order and 
allow us to insert null keys into the map.

b) Ignore null keyed entries - check if the field we read is null, and if it 
is, then ignore that item in the record altogether. This way, HCat is robust in 
what it does - it does not terminate with an NPE, and it does not allow null 
keys in maps that might be problematic to layers above us that are not used to 
seeing nulls as keys in maps.

Why do I bring up the second fix? I bring it up because of the way we 
discovered this bug. When reading from an RCFile, we do not notice this bug. If 
the same query that produced the RCFile instead produces an Orcfile, and we try 
reading from it, we see this problem.

RCFile seems to be quietly stripping any null key entries, whereas Orc retains 
them. This is why we didn't notice this problem for a long while, and suddenly, 
now, we are. Now, if we fix our code to allow nulls in map keys through to 
layers above, we expose layers above to this change, which may then cause them 
to break. (Technically, this is stretching the case because we already break 
now if they care) More importantly, though, we have a case now, where the same 
data will be exposed differently if it were stored as orc or if it were stored 
as rcfile. And as a layer that is supposed to make storage invisible to the end 
user, HCat should attempt to provide some consistency in how data behaves to 
the end user.

That said...

There is another important concern at hand here: nulls in map keys might be due 
to bad data(corruption or loading error), and by stripping them, we might be 
silently hiding that from the user. This is an important point that does steer 
me towards the former approach, of passing it on to layers above, and 
standardize on an understanding that null keys in maps are acceptable data that 
layers above us have to handle. After that, it could be taken on as a further 
consistency fix, to fix RCFile so that it allows nulls in map keys.

Having gone through this discussion of standardization, another important 
question is whether or not there is actually a use-case for null keys in maps 
in data. If there isn't, maybe we shouldn't allow writing that in the first 
place, and both orc and rcfile must simply error out to the end user if they 
try to write a  null map key? Well, it is true that it is possible that data 
errors lead to null keys, but it's also possible that the user wants to store a 
mapping for value transformations, and they might have a transformation for 
null as well. In the case I encountered it, they were writing out an 
intermediate table after having read from a sparse table using a custom input 
format that generated an arbitrary number of columns, and were using the map to 
store column name mappings that would eventually be written out to another 
table. That seems a valid use, and we shouldn't prevent users from this sort of 
usage.

Another reason for not allowing null keys from a java perspective is locking 
and concurrency concerns, where locking on a null is a pain, per philosophical 
disagreements between Joshua Bloch and Doug Lea in the design of HashMap and 
ConcurrentHashMap. However, given that HCatalog reads are happening in a thread 
on a drone where there should be no parallel access of that record, and more 
importantly, this should strictly be used in a read-only kind of usage, we 
should not have to worry about that.

Increasingly, my preference is to change to LinkedHashMaps to allow null keys, 
and for consistency's sake, after this is tackled, to see if we should be 
fixing RCFile to allow null keys(this might be trickier since RCFile has a lot 
of other users that are 

[jira] [Commented] (HIVE-5011) Dynamic partitioning in HCatalog broken on external tables

2013-08-07 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732718#comment-13732718
 ] 

Daniel Dai commented on HIVE-5011:
--

Looks good. In dynamic partition, we shall disable customized external 
partition location. We can support path pattern in the future, but that's more 
complex to do.

+1

 Dynamic partitioning in HCatalog broken on external tables
 --

 Key: HIVE-5011
 URL: https://issues.apache.org/jira/browse/HIVE-5011
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Critical
 Attachments: HIVE-5011.patch


 Dynamic partitioning with HCatalog has been broken as a result of 
 HCATALOG-500 trying to support user-set paths for external tables.
 The goal there was to be able to support other custom destinations apart from 
 the normal hive-style partitions. However, it is not currently possible for 
 users to set paths for dynamic ptn writes, since we don't support any way for 
 users to specify patterns(like, say $\{rootdir\}/$v1.$v2/) into which 
 writes happen, only locations, and the values for dyn. partitions are not 
 known ahead of time. Also, specifying a custom path messes with the way 
 dynamic ptn. code tries to determine what was written to where from the 
 output committer, which means that even if we supported patterned-writes 
 instead of location-writes, we still have to do some more deep diving into 
 the output committer code to support it.
 Thus, my current proposal is that we honour writes to user-specified paths 
 for external tables *ONLY* for static partition writes - i.e., if we can 
 determine that the write is a dyn. ptn. write, we will ignore the user 
 specification. (Note that this does not mean we ignore the table's external 
 location - we honour that - we just don't honour any HCatStorer/etc provided 
 additional location - we stick to what metadata tells us the root location is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5020) HCat reading null-key map entries causes NPE

2013-08-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732719#comment-13732719
 ] 

Edward Capriolo commented on HIVE-5020:
---

If I had to hazard a guess I would say that the original implementation was 
about supporting thrift structures. Possibly if thrift does not support this 
case that design was not carried over.

Personally I think we SHOULD support NULL key and NULL value in maps. The map 
need not be sorted.

 HCat reading null-key map entries causes NPE
 

 Key: HIVE-5020
 URL: https://issues.apache.org/jira/browse/HIVE-5020
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan

 Currently, if someone has a null key in a map, HCatInputFormat will terminate 
 with an NPE while trying to read it.
 {noformat}
 java.lang.NullPointerException
 at java.lang.String.compareTo(String.java:1167)
 at java.lang.String.compareTo(String.java:92)
 at java.util.TreeMap.put(TreeMap.java:545)
 at 
 org.apache.hcatalog.data.HCatRecordSerDe.serializeMap(HCatRecordSerDe.java:222)
 at 
 org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:198)
 at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
 at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
 at 
 org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
 {noformat}
 This is because we use a TreeMap to preserve order of elements in the map 
 when reading from the underlying storage/serde.
 This problem is easily fixed in a number of ways:
 a) Switch to HashMap, which allows null keys. That does not preserve order of 
 keys, which should not be important for map fields, but if we desire that, we 
 have a solution for that too - LinkedHashMap, which would both retain order 
 and allow us to insert null keys into the map.
 b) Ignore null keyed entries - check if the field we read is null, and if it 
 is, then ignore that item in the record altogether. This way, HCat is robust 
 in what it does - it does not terminate with an NPE, and it does not allow 
 null keys in maps that might be problematic to layers above us that are not 
 used to seeing nulls as keys in maps.
 Why do I bring up the second fix? I bring it up because of the way we 
 discovered this bug. When reading from an RCFile, we do not notice this bug. 
 If the same query that produced the RCFile instead produces an Orcfile, and 
 we try reading from it, we see this problem.
 RCFile seems to be quietly stripping any null key entries, whereas Orc 
 retains them. This is why we didn't notice this problem for a long while, and 
 suddenly, now, we are. Now, if we fix our code to allow nulls in map keys 
 through to layers above, we expose layers above to this change, which may 
 then cause them to break. (Technically, this is stretching the case because 
 we already break now if they care) More importantly, though, we have a case 
 now, where the same data will be exposed differently if it were stored as orc 
 or if it were stored as rcfile. And as a layer that is supposed to make 
 storage invisible to the end user, HCat should attempt to provide some 
 consistency in how data behaves to the end user.
 That said...
 There is another important concern at hand here: nulls in map keys might be 
 due to bad data(corruption or loading error), and by stripping them, we might 
 be silently hiding that from the user. This is an important point that does 
 steer me towards the former approach, of passing it on to layers above, and 
 standardize on an understanding that null keys in maps are acceptable data 
 that layers above us have to handle. After that, it could be taken on as a 
 further consistency fix, to fix RCFile so that it allows nulls in map keys.
 Having gone through this discussion of standardization, another important 
 question is whether or not there is actually a use-case for null keys in maps 
 in data. If there isn't, maybe we shouldn't allow writing that in the first 
 place, and both orc and rcfile must simply error out to the end user if they 
 try to write a  null map key? Well, it is true that it is possible that data 
 errors lead to null keys, but it's also possible that the user wants to store 
 a mapping for value transformations, and they might have a transformation for 
 null as well. In the case I encountered it, they were writing out an 
 intermediate table after having read from a sparse table using a custom input 
 format that generated an arbitrary number of columns, and were using the map 
 to store column name mappings that would eventually be written out to another 
 table. That seems a valid use, and we shouldn't prevent users from this sort 
 of usage.
 Another 

[jira] [Updated] (HIVE-4886) beeline code should have apache license headers

2013-08-07 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4886:


   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Thejas!

 beeline code should have apache license headers
 ---

 Key: HIVE-4886
 URL: https://issues.apache.org/jira/browse/HIVE-4886
 Project: Hive
  Issue Type: Task
  Components: JDBC
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.12.0

 Attachments: HIVE-4886.2.patch, HIVE-4886.patch


 The beeline jdbc client added as part of hive server2 changes is based on 
 SQLLine. 
 As beeline is modified version of SQLLine and further modifications are also 
 under apache license, the license headers of these files need to be replaced 
 with apache license headers. We already have the license text of SQLLine in 
 LICENSE file .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Discuss] project chop up

2013-08-07 Thread Edward Capriolo
Some of the hard part was that some of the test classes are in the wrong
module that references classes in a later module.

I think the modules will have to be able to reference each other in many
cases. Serde and QL are tightly coupled. QL is really too large and we
should find a way to cut that up.

Part of this problem is the q.tests

I think one way to handle this is to only allow unit tests inside the
module. I imagine running all the q tests would be done in a final module
hive-qtest. Or possibly two final modules
hive-qtest
hive-qtest-extra (tangential things like UDFS and input formats not core to
hive)


On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley omal...@apache.org wrote:

 On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swar...@gmail.com 
 kulkarni.swar...@gmail.com wrote:

   I'd like to propose we move towards Maven.
 
  Big +1 on this. Most of the major apache projects(hadoop, hbase, avro
 etc.)
  are maven based.
 

 A big +1 from me too. I actually took a pass at it a couple of months ago.
 Some of the hard part was that some of the test classes are in the wrong
 module that references classes in a later module. Obviously that prevents
 any kind of modular build.

 As an additional plus to Maven is that Maven includes tools to correct the
 project and module dependencies.

 -- Owen



[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced

2013-08-07 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732741#comment-13732741
 ] 

Harish Butani commented on HIVE-4964:
-

No just dead code removal. This code was handling:
- the 'having clause' based filters we originally supported with windowing; 
- and also the use of 'lead/lag' udfs outside of UDAFs.

We decided to remove support for these, if i recall, because:
- associating having with windowing would be confusing to users.
- lead/lag udf invocations when multiple partitioning are involved are 
ambiguous. In some cases it is not clear what order to evaluate the window 
expressions.

We have already removed these features from the Semantic Analyzer. So they are 
not exposed to the user.
This is a cleanup step of the Translator/PTFOperator that still had code to 
handle these cases.

 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Discuss] project chop up

2013-08-07 Thread Owen O'Malley
On Wed, Aug 7, 2013 at 2:04 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Some of the hard part was that some of the test classes are in the wrong
 module that references classes in a later module.

 I think the modules will have to be able to reference each other in many
 cases. Serde and QL are tightly coupled. QL is really too large and we
 should find a way to cut that up.


Of course the modules need to reference each other. The problematic test
classes depend on modules lower in the tree, so they form a cycle in
dependency DAG. It only works in the ant build because it compiles all of
the modules before it does the test-compile in any of the modules.

-- Owen



 Part of this problem is the q.tests

 I think one way to handle this is to only allow unit tests inside the
 module. I imagine running all the q tests would be done in a final module
 hive-qtest. Or possibly two final modules
 hive-qtest
 hive-qtest-extra (tangential things like UDFS and input formats not core to
 hive)


 On Wed, Aug 7, 2013 at 4:49 PM, Owen O'Malley omal...@apache.org wrote:

  On Wed, Aug 7, 2013 at 12:55 PM, kulkarni.swar...@gmail.com 
  kulkarni.swar...@gmail.com wrote:
 
I'd like to propose we move towards Maven.
  
   Big +1 on this. Most of the major apache projects(hadoop, hbase, avro
  etc.)
   are maven based.
  
 
  A big +1 from me too. I actually took a pass at it a couple of months
 ago.
  Some of the hard part was that some of the test classes are in the wrong
  module that references classes in a later module. Obviously that prevents
  any kind of modular build.
 
  As an additional plus to Maven is that Maven includes tools to correct
 the
  project and module dependencies.
 
  -- Owen
 



[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-07 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4324:
--

Attachment: HIVE-4324.D12045.2.patch

omalley updated the revision HIVE-4324 [jira] ORC Turn off dictionary encoding 
when number of distinct keys is greater than threshold.

  I addressed Ashutosh's feedback.

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12045

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12045?vs=37185id=37245#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  ql/src/test/queries/clientpositive/orc_dictionary_threshold.q
  ql/src/test/resources/orc-file-dump-dictionary-threshold.out
  ql/src/test/results/clientpositive/orc_dictionary_threshold.q.out

To: JIRA, ashutoshc, omalley


 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
 HIVE-4324.D12045.2.patch


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-07 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4324:


Fix Version/s: 0.12.0
   Status: Patch Available  (was: Open)

 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.12.0

 Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
 HIVE-4324.D12045.2.patch


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-07 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732780#comment-13732780
 ] 

Viraj Bhat commented on HIVE-4331:
--

Hi Ashutosh,
 I have created 2 review requests one which changes files in the HCatalog 
contrib and the other in Hive. Hope this helps in the review process.
 Hive: https://reviews.facebook.net/D12063
 HCatalog: https://reviews.facebook.net/D12069

Viraj

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4123) The RLE encoding for ORC can be improved

2013-08-07 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-4123:
-

Attachment: (was: ORC-Compression-Ratio-Comparison.xlsx)

 The RLE encoding for ORC can be improved
 

 Key: HIVE-4123
 URL: https://issues.apache.org/jira/browse/HIVE-4123
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Owen O'Malley
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.12.0

 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, 
 HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, 
 HIVE-4123.6.txt, ORC-Compression-Ratio-Comparison.xlsx


 The run length encoding of integers can be improved:
 * tighter bit packing
 * allow delta encoding
 * allow longer runs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >