date:20130923

[jira] [Created] (HIVE-5343) Add equals method to ObjectInspectorUtils

2013-09-23 Thread Navis (JIRA)

Navis created HIVE-5343:
---

 Summary: Add equals method to ObjectInspectorUtils
 Key: HIVE-5343
 URL: https://issues.apache.org/jira/browse/HIVE-5343
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial


Might provide shortcut for some use cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5343) Add equals method to ObjectInspectorUtils

2013-09-23 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5343:


Status: Patch Available  (was: Open)

> Add equals method to ObjectInspectorUtils
> -
>
> Key: HIVE-5343
> URL: https://issues.apache.org/jira/browse/HIVE-5343
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
>
> Might provide shortcut for some use cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5343) Add equals method to ObjectInspectorUtils

2013-09-23 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5343:
--

Attachment: D13053.1.patch

navis requested code review of "HIVE-5343 [jira] Add equals method to 
ObjectInspectorUtils".

Reviewers: JIRA

HIVE-5343 Add equals method to ObjectInspectorUtils

Might provide shortcut for some use cases.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D13053

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFField.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIn.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPEqual.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNotEqual.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFReflect.java
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ListObjectsEqualComparer.java
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
  
serde/src/test/org/apache/hadoop/hive/serde2/binarysortable/TestBinarySortableSerDe.java
  
serde/src/test/org/apache/hadoop/hive/serde2/lazybinary/TestLazyBinarySerDe.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/37893/

To: JIRA, navis


> Add equals method to ObjectInspectorUtils
> -
>
> Key: HIVE-5343
> URL: https://issues.apache.org/jira/browse/HIVE-5343
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: D13053.1.patch
>
>
> Might provide shortcut for some use cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5276) Skip useless string encoding stage for hiveserver2

2013-09-23 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5276:


Attachment: HIVE-5276.3.patch.txt

Rebased to trunk

> Skip useless string encoding stage for hiveserver2
> --
>
> Key: HIVE-5276
> URL: https://issues.apache.org/jira/browse/HIVE-5276
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: D12879.2.patch, HIVE-5276.3.patch.txt, 
> HIVE-5276.D12879.1.patch
>
>
> Current hiveserver2 acquires rows in string format which is used for cli 
> output. Then convert them into row again and convert to final format lastly. 
> This is inefficient and memory consuming. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5342) Remove pre hadoop-0.20.0 related codes

2013-09-23 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774355#comment-13774355
 ] 

Ashutosh Chauhan commented on HIVE-5342:


Dupe of HIVE-4518 ? [~navis] You may want to review patch on HIVE-4518 which 
seems to be more comprehensive than this one.

> Remove pre hadoop-0.20.0 related codes
> --
>
> Key: HIVE-5342
> URL: https://issues.apache.org/jira/browse/HIVE-5342
> Project: Hive
>  Issue Type: Task
>  Components: Shims
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: D13047.1.patch
>
>
> Recently, we discussed not supporting hadoop-0.20.0. If it would be done like 
> that or not, 0.17 related codes would be removed before that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5223) explain doesn't show serde used for table

2013-09-23 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5223:
---

Status: Patch Available  (was: Open)

> explain doesn't show serde used for table
> -
>
> Key: HIVE-5223
> URL: https://issues.apache.org/jira/browse/HIVE-5223
> Project: Hive
>  Issue Type: Improvement
>  Components: Diagnosability
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5223.1.patch, HIVE-5223.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5223) explain doesn't show serde used for table

2013-09-23 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5223:
---

Attachment: HIVE-5223.1.patch

Patch rebased to trunk. Passed all the tests.

> explain doesn't show serde used for table
> -
>
> Key: HIVE-5223
> URL: https://issues.apache.org/jira/browse/HIVE-5223
> Project: Hive
>  Issue Type: Improvement
>  Components: Diagnosability
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5223.1.patch, HIVE-5223.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5223) explain doesn't show serde used for table

2013-09-23 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774361#comment-13774361
 ] 

Ashutosh Chauhan commented on HIVE-5223:


Phabricator https://reviews.facebook.net/D13059

> explain doesn't show serde used for table
> -
>
> Key: HIVE-5223
> URL: https://issues.apache.org/jira/browse/HIVE-5223
> Project: Hive
>  Issue Type: Improvement
>  Components: Diagnosability
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5223.1.patch, HIVE-5223.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Documentation for HIVE-4963 (Support in memory PTF partitions) missing

2013-09-23 Thread Lars Francke

Hi,

it'd be great if someone could answer my question here


Thank you!

Cheers,
Lars

[jira] [Commented] (HIVE-5272) Column statistics on a invalid column name results in IndexOutOfBoundsException

2013-09-23 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774367#comment-13774367
 ] 

Ashutosh Chauhan commented on HIVE-5272:


[~prasanth_j] If you have already run tests, can you post the results here?

> Column statistics on a invalid column name results in 
> IndexOutOfBoundsException
> ---
>
> Key: HIVE-5272
> URL: https://issues.apache.org/jira/browse/HIVE-5272
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: statistics
> Fix For: 0.13.0
>
> Attachments: HIVE-5272.1.patch
>
>
> When invalid column name is specified for column statistics 
> IndexOutOfBoundsException is thrown. 
> {code}hive> analyze table customer_staging compute statistics for columns 
> c_first_name, invalid_name, c_customer_sk;
> FAILED: IndexOutOfBoundsException Index: 2, Size: 1{code}
> If the invalid column name appears at first or last then 
> INVALID_COLUMN_REFERENCE is thrown at query planning stage. But if the 
> invalid column name appears somewhere in the middle of column lists then 
> IndexOutOfBoundsException is thrown at semantic analysis step. The problem is 
> with getTableColumnType() and getPartitionColumnType() methods. The following 
> segment 
> {code}for (int i=0; icolName = colNames.get(i);
>   for (FieldSchema col: cols) {
> if (colName.equalsIgnoreCase(col.getName())) {
>   colTypes.add(i, new String(col.getType()));
> }
>   }
> }{code}
> is the reason for it. If the invalid column names appears in the middle of 
> column list then the equalsIgnoreCase() skips the invalid name and increments 
> the i. Since the list is not initialized it results in exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors

2013-09-23 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774610#comment-13774610
 ] 

Chaoyu Tang commented on HIVE-5320:
---

Agree that this is mainly caused by the improper implementation from 
json-serde, but is there anything Hive can do to better cope with this kind of 
unexpected behavior or prevent it from happening again? To provide clear 
documentations for its related SerDe APIs like that in ListObjectInspector?

> Querying a table with nested struct type over JSON data results in errors
> -
>
> Key: HIVE-5320
> URL: https://issues.apache.org/jira/browse/HIVE-5320
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-5320.patch
>
>
> Querying a table with nested_struct datatype like
> ==
> create table nest_struct_tbl (col1 string, col2 array a2:array>>>) ROW FORMAT SERDE 
> 'org.openx.data.jsonserde.JsonSerDe'; 
> ==
> over JSON data cause errors including java.lang.IndexOutOfBoundsException or 
> corrupted data. 
> The JsonSerDe used is 
> json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar.
> The cause is that the method:
> public List getStructFieldsDataAsList(Object o) 
> in JsonStructObjectInspector.java 
> returns a list referencing to a static arraylist "values"
> So the local variable 'list' in method serialize of Hive LazySimpleSerDe 
> class is returned with same reference in its recursive calls and its element 
> values are kept on being overwritten in the case STRUCT.
> Solutions:
> 1. Fix in JsonSerDe, and change the field 'values' in 
> java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java
> to instance scope.
> Filed a ticket to JSonSerDe 
> (https://github.com/rcongiu/Hive-JSON-Serde/issues/31)
> 2. Ideally, in the method serialize of class LazySimpleSerDe, we should 
> defensively save a copy of a list resulted from list = 
> soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of 
> JsonStructObjectInspector, so that the recursive calls of serialize can work 
> properly regardless of the extended SerDe implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5274) HCatalog package renaming backward compatibility follow-up

2013-09-23 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774666#comment-13774666
 ] 

Alan Gates commented on HIVE-5274:
--

I'm fine with b, but I think it's [~viraj] that really needs to vote on this, 
since it's his changes that will be affected.

> HCatalog package renaming backward compatibility follow-up
> --
>
> Key: HIVE-5274
> URL: https://issues.apache.org/jira/browse/HIVE-5274
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Fix For: 0.12.0
>
>
> As part of HIVE-4869, the hbase storage handler in hcat was moved to 
> org.apache.hive.hcatalog, and then put back to org.apache.hcatalog since it 
> was intended to be deprecated as well.
> However, it imports and uses several org.apache.hive.hcatalog classes. This 
> needs to be changed to use org.apache.hcatalog classes.
> ==
> Note : The above is a complete description of this issue in and of by itself, 
> the following is more details on the backward-compatibility goal I have(not 
> saying that each of these things are violated) : 
> a) People using org.apache.hcatalog packages should continue being able to 
> use that package, and see no difference at compile time or runtime. All code 
> here is considered deprecated, and will be gone by the time hive 0.14 rolls 
> around. Additionally, org.apache.hcatalog should behave as if it were 0.11 
> for all compatibility purposes.
> b) People using org.apache.hive.hcatalog packages should never have an 
> org.apache.hcatalog dependency injected in.
> Thus,
> It is okay for org.apache.hcatalog to use org.apache.hive.hcatalog packages 
> internally (say HCatUtil, for example), as long as any interfaces only expose 
> org.apache.hcatalog.\* For tests that test org.apache.hcatalog.\*, we must be 
> capable of testing it from a pure org.apache.hcatalog.\* world.
> It is never okay for org.apache.hive.hcatalog to use org.apache.hcatalog, 
> even in tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5170) Sorted Bucketed Partitioned Insert hard-codes the reducer count == bucket count

2013-09-23 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774655#comment-13774655
 ] 

Gopal V commented on HIVE-5170:
---

Tried to do this, unfortunately the FileSinkOperator uses the task-id as the 
bucket filename.

So if you have 12 reducers, the last reducer will automatically write it to 
00011_0.

This makes it slightly more complex to fix this without writing a new 
SortedFileSinkOperator.

> Sorted Bucketed Partitioned Insert hard-codes the reducer count == bucket 
> count
> ---
>
> Key: HIVE-5170
> URL: https://issues.apache.org/jira/browse/HIVE-5170
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.12.0
> Environment: Ubuntu LXC
>Reporter: Gopal V
>
> When performing a hive sorted-partitioned insert, the insert optimizer 
> hard-codes the number of output files to the actual bucket count of the table.
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L4852
> We need at least that many reducers or if limited, switch to multi-spray (as 
> implemented already), but more reducers is wasteful as long as the HiveKey 
> only contains the partition columns.
> At this point, we're limited to reducers = n-bucket still, which is a problem 
> for partitioning requests which need to insert nearly a terabyte of data into 
> a single-digit bucket count and four-digit partition count.
> Since that is routed by the hasCode of the HiveKey, we can ensure that works 
> by modifying the HiveKey to handle n-buckets internally.
> Basically it should only generate hashCode = (sort_cols.hashCode() % n) 
> routing only to n reducers over-all, despite how many we spin up.
> So far so good with the hard-coded reducer count.
> But provided we fix the issues brought up by HIVE-5169, the insert becomes 
> friendlier to a higher reducer count as well.
> At this juncture, we can modify the hashCode to be slightly more interesting.
> hashCode = (part_cols.hashCode()*31 + (sort_cols.hashCode() % n)) 
> This generates somewhere between n to partition_count * n unique hash-codes.
> Since the sort-order & bucketing has to be maintained per-partition dir, 
> distributing this equally across any number of reducers will result in the 
> scale-out of the reducer count.
> This will allow a reducer count that will allow for far faster inserts of ORC 
> data into a partitioned/sorted table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc

2013-09-23 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5279:
---

Status: Open  (was: Patch Available)

Quite a few tests failed. 35 in TestCliDriver & 2 in TestNegativeCliDriver, 
including 
database_drop.q,index*.q,ql_rewrite_gbtoidx.q,show_indexes_edge_cases.q,show_indexes_syntax.q,udaf_collect_set.q,union_view.q,virtual_column.q.
 In -ve testcases, index_compact_entry_limit.q,index_compact_size_limit.q

> Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
> ---
>
> Key: HIVE-5279
> URL: https://issues.apache.org/jira/browse/HIVE-5279
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Critical
> Attachments: 5279.patch, D12963.1.patch, D12963.2.patch, 
> D12963.3.patch, D12963.4.patch
>
>
> We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how 
> previous serialization mechanism solved this but, kryo complaints that it's 
> not Serializable and fails the query.
> The log below is the example, 
> {noformat}
> java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class 
> cannot be created (missing no-arg constructor): 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
> Serialization trace:
> inputOI 
> (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval)
> genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc)
> aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc)
> conf (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
>   at org.apache.h
> {noformat}
> If this cannot be fixed in somehow, some UDAFs should be modified to be run 
> on hive-0.13.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors

2013-09-23 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774693#comment-13774693
 ] 

Ashutosh Chauhan commented on HIVE-5320:


[~ctang.cloudera] I am not sure if it is easy to detect such a badly behaving 
serde. This is not something easily enforceable. So, only thing I can see is to 
improve on our documentation so that serde writers are well aware of this 
behavior. Lets close this one as won't fix and improve documentation on cwiki.

> Querying a table with nested struct type over JSON data results in errors
> -
>
> Key: HIVE-5320
> URL: https://issues.apache.org/jira/browse/HIVE-5320
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-5320.patch
>
>
> Querying a table with nested_struct datatype like
> ==
> create table nest_struct_tbl (col1 string, col2 array a2:array>>>) ROW FORMAT SERDE 
> 'org.openx.data.jsonserde.JsonSerDe'; 
> ==
> over JSON data cause errors including java.lang.IndexOutOfBoundsException or 
> corrupted data. 
> The JsonSerDe used is 
> json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar.
> The cause is that the method:
> public List getStructFieldsDataAsList(Object o) 
> in JsonStructObjectInspector.java 
> returns a list referencing to a static arraylist "values"
> So the local variable 'list' in method serialize of Hive LazySimpleSerDe 
> class is returned with same reference in its recursive calls and its element 
> values are kept on being overwritten in the case STRUCT.
> Solutions:
> 1. Fix in JsonSerDe, and change the field 'values' in 
> java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java
> to instance scope.
> Filed a ticket to JSonSerDe 
> (https://github.com/rcongiu/Hive-JSON-Serde/issues/31)
> 2. Ideally, in the method serialize of class LazySimpleSerDe, we should 
> defensively save a copy of a list resulted from list = 
> soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of 
> JsonStructObjectInspector, so that the recursive calls of serialize can work 
> properly regardless of the extended SerDe implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5272) Column statistics on a invalid column name results in IndexOutOfBoundsException

2013-09-23 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-5272:
-

Attachment: HIVE-5272.1.patch.txt

> Column statistics on a invalid column name results in 
> IndexOutOfBoundsException
> ---
>
> Key: HIVE-5272
> URL: https://issues.apache.org/jira/browse/HIVE-5272
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: statistics
> Fix For: 0.13.0
>
> Attachments: HIVE-5272.1.patch.txt
>
>
> When invalid column name is specified for column statistics 
> IndexOutOfBoundsException is thrown. 
> {code}hive> analyze table customer_staging compute statistics for columns 
> c_first_name, invalid_name, c_customer_sk;
> FAILED: IndexOutOfBoundsException Index: 2, Size: 1{code}
> If the invalid column name appears at first or last then 
> INVALID_COLUMN_REFERENCE is thrown at query planning stage. But if the 
> invalid column name appears somewhere in the middle of column lists then 
> IndexOutOfBoundsException is thrown at semantic analysis step. The problem is 
> with getTableColumnType() and getPartitionColumnType() methods. The following 
> segment 
> {code}for (int i=0; icolName = colNames.get(i);
>   for (FieldSchema col: cols) {
> if (colName.equalsIgnoreCase(col.getName())) {
>   colTypes.add(i, new String(col.getType()));
> }
>   }
> }{code}
> is the reason for it. If the invalid column names appears in the middle of 
> column list then the equalsIgnoreCase() skips the invalid name and increments 
> the i. Since the list is not initialized it results in exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5272) Column statistics on a invalid column name results in IndexOutOfBoundsException

2013-09-23 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774737#comment-13774737
 ] 

Prasanth J commented on HIVE-5272:
--

[~ashutoshc] I did not run the tests separately. I was looking for HiveQA to 
run the tests. Renamed the patch again to see if HiveQA picks up. 

> Column statistics on a invalid column name results in 
> IndexOutOfBoundsException
> ---
>
> Key: HIVE-5272
> URL: https://issues.apache.org/jira/browse/HIVE-5272
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: statistics
> Fix For: 0.13.0
>
> Attachments: HIVE-5272.1.patch.txt
>
>
> When invalid column name is specified for column statistics 
> IndexOutOfBoundsException is thrown. 
> {code}hive> analyze table customer_staging compute statistics for columns 
> c_first_name, invalid_name, c_customer_sk;
> FAILED: IndexOutOfBoundsException Index: 2, Size: 1{code}
> If the invalid column name appears at first or last then 
> INVALID_COLUMN_REFERENCE is thrown at query planning stage. But if the 
> invalid column name appears somewhere in the middle of column lists then 
> IndexOutOfBoundsException is thrown at semantic analysis step. The problem is 
> with getTableColumnType() and getPartitionColumnType() methods. The following 
> segment 
> {code}for (int i=0; icolName = colNames.get(i);
>   for (FieldSchema col: cols) {
> if (colName.equalsIgnoreCase(col.getName())) {
>   colTypes.add(i, new String(col.getType()));
> }
>   }
> }{code}
> is the reason for it. If the invalid column names appears in the middle of 
> column list then the equalsIgnoreCase() skips the invalid name and increments 
> the i. Since the list is not initialized it results in exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5272) Column statistics on a invalid column name results in IndexOutOfBoundsException

2013-09-23 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-5272:
-

Attachment: (was: HIVE-5272.1.patch)

> Column statistics on a invalid column name results in 
> IndexOutOfBoundsException
> ---
>
> Key: HIVE-5272
> URL: https://issues.apache.org/jira/browse/HIVE-5272
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: statistics
> Fix For: 0.13.0
>
>
> When invalid column name is specified for column statistics 
> IndexOutOfBoundsException is thrown. 
> {code}hive> analyze table customer_staging compute statistics for columns 
> c_first_name, invalid_name, c_customer_sk;
> FAILED: IndexOutOfBoundsException Index: 2, Size: 1{code}
> If the invalid column name appears at first or last then 
> INVALID_COLUMN_REFERENCE is thrown at query planning stage. But if the 
> invalid column name appears somewhere in the middle of column lists then 
> IndexOutOfBoundsException is thrown at semantic analysis step. The problem is 
> with getTableColumnType() and getPartitionColumnType() methods. The following 
> segment 
> {code}for (int i=0; icolName = colNames.get(i);
>   for (FieldSchema col: cols) {
> if (colName.equalsIgnoreCase(col.getName())) {
>   colTypes.add(i, new String(col.getType()));
> }
>   }
> }{code}
> is the reason for it. If the invalid column names appears in the middle of 
> column list then the equalsIgnoreCase() skips the invalid name and increments 
> the i. Since the list is not initialized it results in exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5274) HCatalog package renaming backward compatibility follow-up

2013-09-23 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774738#comment-13774738
 ] 

Eugene Koifman commented on HIVE-5274:
--

I think no end user app should be mixing org.apache.hcatalog.* and 
org.apache.hive.hcatalog.* - in fact we should probably make that clear in the 
doc/rel notes.

so the user has 2 options:
1. use deprecated org.apache.hcatalog (including hcat storage handler)
2. switch to new org.apache.hive.hcatalog and get new HBaseStorageHandler

so I think your option 'b' is the right one

> HCatalog package renaming backward compatibility follow-up
> --
>
> Key: HIVE-5274
> URL: https://issues.apache.org/jira/browse/HIVE-5274
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Fix For: 0.12.0
>
>
> As part of HIVE-4869, the hbase storage handler in hcat was moved to 
> org.apache.hive.hcatalog, and then put back to org.apache.hcatalog since it 
> was intended to be deprecated as well.
> However, it imports and uses several org.apache.hive.hcatalog classes. This 
> needs to be changed to use org.apache.hcatalog classes.
> ==
> Note : The above is a complete description of this issue in and of by itself, 
> the following is more details on the backward-compatibility goal I have(not 
> saying that each of these things are violated) : 
> a) People using org.apache.hcatalog packages should continue being able to 
> use that package, and see no difference at compile time or runtime. All code 
> here is considered deprecated, and will be gone by the time hive 0.14 rolls 
> around. Additionally, org.apache.hcatalog should behave as if it were 0.11 
> for all compatibility purposes.
> b) People using org.apache.hive.hcatalog packages should never have an 
> org.apache.hcatalog dependency injected in.
> Thus,
> It is okay for org.apache.hcatalog to use org.apache.hive.hcatalog packages 
> internally (say HCatUtil, for example), as long as any interfaces only expose 
> org.apache.hcatalog.\* For tests that test org.apache.hcatalog.\*, we must be 
> capable of testing it from a pure org.apache.hcatalog.\* world.
> It is never okay for org.apache.hive.hcatalog to use org.apache.hcatalog, 
> even in tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-5337) org.apache.hcatalog.common.HCatUtil is used by org.apache.hive.hcatalog.templeton.tool

2013-09-23 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved HIVE-5337.
--

Resolution: Won't Fix

this should actually be addressed in HIVE-5133

> org.apache.hcatalog.common.HCatUtil is used by 
> org.apache.hive.hcatalog.templeton.tool
> --
>
> Key: HIVE-5337
> URL: https://issues.apache.org/jira/browse/HIVE-5337
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.12.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 0.12.0
>
>
> specifically org.apache.hive.hcatalog.templeton.tool.TempletonControllerJob 
> and org.apache.hive.hcatalog.templeton.tool.MSTokenCleanOutputFormat 
> they should be using org.apache.hive.hcatalog.common.HCatUtil

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4823) implement vectorized TRIM(), LTRIM(), RTRIM()

2013-09-23 Thread Eric Hanson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4823:
--

Attachment: HIVE-4823.2-vectorization.patch

Rebased this patch on the latest vectorization branch. Verified that junit 
tests pass. Did ad hoc end-to end tests for the TRIM(), LTRIM() and RTRIM() 
functions and verified that they work in vectorized mode.

> implement vectorized TRIM(), LTRIM(), RTRIM()
> -
>
> Key: HIVE-4823
> URL: https://issues.apache.org/jira/browse/HIVE-4823
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: vectorization-branch
>Reporter: Eric Hanson
>Assignee: Eric Hanson
> Attachments: HIVE-4823.1-vectorization.patch, 
> HIVE-4823.2-vectorization.patch
>
>
> Make it work end-to-end, including the vectorized expression, and tying it 
> together in VectorizationContext so a SQL query will run using vectorization 
> when invoking these functions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5271) Convert join op to a map join op in the planning phase

2013-09-23 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5271:
-

Attachment: HIVE-5271.2.patch

No .q.out because we have yet to integrate with tez test class.

> Convert join op to a map join op in the planning phase
> --
>
> Key: HIVE-5271
> URL: https://issues.apache.org/jira/browse/HIVE-5271
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: tez-branch
>
> Attachments: HIVE-5271.2.patch, HIVE-5271.WIP.patch
>
>
> This captures the planning changes required in hive to support hash joins. We 
> need to convert the join operator to a map join operator. This is hooked into 
> the infrastructure provided by HIVE-5095.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4531) [WebHCat] Collecting task logs to hdfs

2013-09-23 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4531:
-

Attachment: HIVE-4531-11.patch

HIVE-4531-11.patch refine the exception handling code a bit.

> [WebHCat] Collecting task logs to hdfs
> --
>
> Key: HIVE-4531
> URL: https://issues.apache.org/jira/browse/HIVE-4531
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, WebHCat
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12.0
>
> Attachments: HIVE-4531-10.patch, HIVE-4531-11.patch, 
> HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, HIVE-4531-4.patch, 
> HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, HIVE-4531-8.patch, 
> HIVE-4531-9.patch, samplestatusdirwithlist.tar.gz
>
>
> It would be nice we collect task logs after job finish. This is similar to 
> what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5235) Infinite loop with ORC file and Hive 0.11

2013-09-23 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774790#comment-13774790
 ] 

Owen O'Malley commented on HIVE-5235:
-

Pere, 
  If when you regenerate the data, the problem goes away it is most likely a 
race condition down in the interface to the zlib compression library that 
results in the compressed bytes being corrupted. In both of the stacks you've 
posted, it is reading an integer column with compression turned on and a data 
corruption bug down in the zlib routines could easily explain both stacks.

* Which version of the jdk are you running on?
* Which exact OS version are you running?

If you come across one of the broken files, I would like to see it. You can use 
my gmail address (owen.omal...@gmail.com) to send files up to 10GB from a gmail 
address. Hopefully, 10GB is big enough. :)

> Infinite loop with ORC file and Hive 0.11
> -
>
> Key: HIVE-5235
> URL: https://issues.apache.org/jira/browse/HIVE-5235
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
> Environment: Gentoo linux with Hortonworks Hadoop 
> hadoop-1.1.2.23.tar.gz and Apache Hive 0.11d
>Reporter: Iván de Prado
>Priority: Blocker
>
> We are using Hive 0.11 with ORC file format and we get some tasks blocked in 
> some kind of infinite loop. They keep working indefinitely when we set a huge 
> task expiry timeout. If we the expiry time to 600 second, the taks fail 
> because of not reporting progress, and finally, the Job fails. 
> That is not consistent, and some times between jobs executions the behavior 
> changes. It happen for different queries.
> We are using Hive 0.11 with Hadoop hadoop-1.1.2.23 from Hortonworks. The taks 
> that is blocked keeps consuming 100% of CPU usage, and the stack trace is 
> always the same consistently. Everything points to some kind of infinite 
> loop. My guessing is that it has some relation to the ORC file. Maybe some 
> pointer is not right when writing generating some kind of infinite loop when 
> reading.  Or maybe there is a bug in the reading stage.
> More information below. The stack trace:
> {noformat} 
> "main" prio=10 tid=0x7f20a000a800 nid=0x1ed2 runnable [0x7f20a8136000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.zip.Inflater.inflateBytes(Native Method)
>   at java.util.zip.Inflater.inflate(Inflater.java:256)
>   - locked <0xf42a6ca0> (a java.util.zip.ZStreamRef)
>   at 
> org.apache.hadoop.hive.ql.io.orc.ZlibCodec.decompress(ZlibCodec.java:64)
>   at 
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:128)
>   at 
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:143)
>   at 
> org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readVulong(SerializationUtils.java:54)
>   at 
> org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readVslong(SerializationUtils.java:65)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReader.readValues(RunLengthIntegerReader.java:66)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReader.next(RunLengthIntegerReader.java:81)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:332)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:802)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1214)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:71)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:46)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:300)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:218)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236)
>   - eliminated <0xe1459700> (a 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
>   - locked <0xe1459700> (a 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader)

[jira] [Commented] (HIVE-5274) HCatalog package renaming backward compatibility follow-up

2013-09-23 Thread Viraj Bhat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774809#comment-13774809
 ] 

Viraj Bhat commented on HIVE-5274:
--

Hi all,
  I am not in favour of option b. There are 4 test cases in the storage-handler 
package in HCat which should ideally be moved inside the 
org.apache.hive.hcatalog under the respective modules.
TestHBaseInputFormat.java == moved to core package
TestHiveHBaseTableOutputFormat.java = moved to core package
TestPigHBaseStorageHandler.java = moved to hcatalog-pig-adaptor package
TestHiveHBaseStorageHandler.java moved to core package

Once this is done we can then effectively let all the old code stay inside 
org.apache.hcatalog.* I can talk to Sushanth about this.

Viraj

> HCatalog package renaming backward compatibility follow-up
> --
>
> Key: HIVE-5274
> URL: https://issues.apache.org/jira/browse/HIVE-5274
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Fix For: 0.12.0
>
>
> As part of HIVE-4869, the hbase storage handler in hcat was moved to 
> org.apache.hive.hcatalog, and then put back to org.apache.hcatalog since it 
> was intended to be deprecated as well.
> However, it imports and uses several org.apache.hive.hcatalog classes. This 
> needs to be changed to use org.apache.hcatalog classes.
> ==
> Note : The above is a complete description of this issue in and of by itself, 
> the following is more details on the backward-compatibility goal I have(not 
> saying that each of these things are violated) : 
> a) People using org.apache.hcatalog packages should continue being able to 
> use that package, and see no difference at compile time or runtime. All code 
> here is considered deprecated, and will be gone by the time hive 0.14 rolls 
> around. Additionally, org.apache.hcatalog should behave as if it were 0.11 
> for all compatibility purposes.
> b) People using org.apache.hive.hcatalog packages should never have an 
> org.apache.hcatalog dependency injected in.
> Thus,
> It is okay for org.apache.hcatalog to use org.apache.hive.hcatalog packages 
> internally (say HCatUtil, for example), as long as any interfaces only expose 
> org.apache.hcatalog.\* For tests that test org.apache.hcatalog.\*, we must be 
> capable of testing it from a pure org.apache.hcatalog.\* world.
> It is never okay for org.apache.hive.hcatalog to use org.apache.hcatalog, 
> even in tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5344) DeprecatedLzoTextInputFormat never purges its lzo index cache

2013-09-23 Thread Scott Sitar (JIRA)

Scott Sitar created HIVE-5344:
-

 Summary: DeprecatedLzoTextInputFormat never purges its lzo index 
cache
 Key: HIVE-5344
 URL: https://issues.apache.org/jira/browse/HIVE-5344
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.10.0
 Environment: Debian 6, cloudera cdh4.2.0
Reporter: Scott Sitar


DeprecatedLzoTextInputFormat holds a cache of lzo indexes for every file that 
it ever reads (so does LzoTextInputFormat), but this cache can grow in size 
without bound and is never pruned.

We are running hive queries against lzo-compressed logs, connecting through 
jdbc and hive-server2.  HiveInputFormat stores a single instance of 
DeprecatedLzoTextInputFormat, and will eventually run out of memory as this 
cache grows out of control.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics

2013-09-23 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774784#comment-13774784
 ] 

Ashutosh Chauhan commented on HIVE-5324:


Few suggestions:
* Put StatsProvidingRecordWriter in o.a.h.ql.io package in its own file and 
provide javadoc for it.
* For getRawDataSizeOfColumns(int[] colIndices); friendlier api is 
getRawDataSizeOfColumns(List colNames); Is that easy to do?
* Not sure if there is a usecase for these apis on writer. Is there any?
* Can you also add in code which invokes this new interface in this patch. (Not 
the impl of interface). I guess that will be in FSOp::close()?

> Extend record writer and ORC reader/writer interfaces to provide statistics
> ---
>
> Key: HIVE-5324
> URL: https://issues.apache.org/jira/browse/HIVE-5324
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile, statistics
> Fix For: 0.13.0
>
> Attachments: HIVE-5324.1.patch.txt
>
>
> The current implementation for computing statistics (number of rows and raw 
> data size) happens for every single row processed. The processOp() method in 
> FileSinkOperator gets raw data size for each row from the serde and 
> accumulates the size in hashmap while counting the number of rows. This 
> accumulated statistics is then published to metastore. 
> In case of ORC, ORC already stores enough statistics internally which can be 
> made use of when publishing the stats to metastore. This will avoid the 
> duplication of work that is happening in the processOp(). Also getting the 
> statistics directly from ORC is very cheap (can directly read from the file 
> footer).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5296) Memory leak: OOM Error after multiple open/closed JDBC connections.

2013-09-23 Thread Kousuke Saruta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775564#comment-13775564
 ] 

Kousuke Saruta commented on HIVE-5296:
--

I found the instance of Hashtable$Entry continuing to increase. This is caused 
by two reasons as follows.

1. opHandle wouldn't release if Exception is thrown during executing query or 
command
When Exception is thrown during executing query or command, operation handle 
object will not be released from Map (OperationManager#handleToOperation) 
because opHandleSet.add(opHandle) will not be executed in HiveSessionImpl 
(Hiveserver2 side) and execResp.getOperationHandle() will not be executed in 
HiveStatement (JDBC Client side).

{code}
  public OperationHandle executeStatementInternal()
  throws HiveSQLException {
acquire();
try {
   ExecuteStatementOperation operation = getOperationManager()
  .newExecuteStatementOperation(getSession(), statement, confOverlay, 
runAsync);
  opHandle
  operation.run();  
 <--- Throws exception and cannot get 
handle.
  OperationHandle opHandle = operation.getHandle();
  opHandleSet.add(opHandle);
  return opHandle;
} finally {
  release();
}
  }
{code}

{code}
 public boolean execute(String sql) throws SQLException {

...

try {
  closeClientOperation();
  TExecuteStatementReq execReq = new TExecuteStatementReq(sessHandle, sql);
  execReq.setConfOverlay(sessConf);
  TExecuteStatementResp execResp = client.ExecuteStatement(execReq);
  Utils.verifySuccessWithInfo(execResp.getStatus());
 <---  Throws exception and cannot get handle.
  stmtHandle = execResp.getOperationHandle();
} catch (SQLException eS) {
  throw eS;
} catch (Exception ex) {
  throw new SQLException(ex.toString(), "08S01", ex);
}
...
{code}




2. FileSystem$Cache will be increase.
When we call FileSystem#get, FileSystem object is cached in FileSystem$Cache.
Cache is implemented using HashMap and equality of Key is implemented as 
follows.

{code}
  /** {@inheritDoc} */
  public int hashCode() {
return (scheme + authority).hashCode() + ugi.hashCode() + (int)unique;
  }

  static boolean isEqual(Object a, Object b) {
return a == b || (a != null && a.equals(b));
  }

  /** {@inheritDoc} */
  public boolean equals(Object obj) {
if (obj == this) {
  return true;
}
if (obj != null && obj instanceof Key) {
  Key that = (Key)obj;
  return isEqual(this.scheme, that.scheme)
 && isEqual(this.authority, that.authority)
 && isEqual(this.ugi, that.ugi)
 && (this.unique == that.unique);
}
return false;
  }
{code}

Key contains UserGroupInformation and two UserGroupInformation objects are 
equivalent when the Subject objects included in each UserGroupInformation 
object are same (not equivalent).
{code}
  public boolean equals(Object o) {
if (o == this) {
  return true;
} else if (o == null || getClass() != o.getClass()) {
  return false;
} else {
  return subject == ((UserGroupInformation) o).subject;
}
  }
{code}

In Hiveserver2, it will get UserGroupInformation 
UserGroupInformation#createRemoteUser or UserGroupInformation#createProxyUser.
Those methods create new Subject objects so Cache doesn't work.

If FileSystem.closeAll or FileSystem#close method are called, FileSystem object 
will be removed from Cache.

> Memory leak: OOM Error after multiple open/closed JDBC connections. 
> 
>
> Key: HIVE-5296
> URL: https://issues.apache.org/jira/browse/HIVE-5296
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
> Environment: Hive 0.12.0, Hadoop 1.1.2, Debian.
>Reporter: Douglas
>  Labels: hiveserver
> Fix For: 0.12.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This error seems to relate to https://issues.apache.org/jira/browse/HIVE-3481
> However, on inspection of the related patch and my built version of Hive 
> (patch carried forward to 0.12.0), I am still seeing the described behaviour.
> Multiple connections to Hiveserver2, all of which are closed and disposed of 
> properly show the Java heap size to grow extremely quickly. 
> This issue can be recreated using the following code
> {code}
> import java.sql.DriverManager;
> import java.sql.Connection;
> import java.sql.ResultSet;
> import java.sql.SQLException;
> import java.sql.Statement;
> import java.util.Properties;
> import org.apache.hive.

[jira] [Commented] (HIVE-4914) filtering via partition name should be done inside metastore server (implementation)

2013-09-23 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774793#comment-13774793
 ] 

Sergey Shelukhin commented on HIVE-4914:


I think this is caused by error handling (ExceptionListener) I added to 
deserializeExpression, previously XML deserialization was just silently 
ignoring some exceptions. Since I switched to Kryo, the changes to original 
methods are no longer necessary.

> filtering via partition name should be done inside metastore server 
> (implementation)
> 
>
> Key: HIVE-4914
> URL: https://issues.apache.org/jira/browse/HIVE-4914
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: D12561.5.patch, HIVE-4914.01.patch, HIVE-4914.02.patch, 
> HIVE-4914.03.patch, HIVE-4914.04.patch, HIVE-4914.05.patch, 
> HIVE-4914.D12561.1.patch, HIVE-4914.D12561.2.patch, HIVE-4914.D12561.3.patch, 
> HIVE-4914.D12561.4.patch, HIVE-4914.D12645.1.patch, 
> HIVE-4914-only-no-gen.patch, HIVE-4914-only.patch, HIVE-4914.patch, 
> HIVE-4914.patch, HIVE-4914.patch
>
>
> Currently, if the filter pushdown is impossible (which is most cases), the 
> client gets all partition names from metastore, filters them, and asks for 
> partitions by names for the filtered set.
> Metastore server code should do that instead; it should check if pushdown is 
> possible and do it if so; otherwise it should do name-based filtering.
> Saves the roundtrip with all partition names from the server to client, and 
> also removes the need to have pushdown viability checking on both sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5296) Memory leak: OOM Error after multiple open/closed JDBC connections.

2013-09-23 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-5296:
-

Attachment: HIVE-5296.patch

I've create a patch of first idea.

> Memory leak: OOM Error after multiple open/closed JDBC connections. 
> 
>
> Key: HIVE-5296
> URL: https://issues.apache.org/jira/browse/HIVE-5296
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
> Environment: Hive 0.12.0, Hadoop 1.1.2, Debian.
>Reporter: Douglas
>  Labels: hiveserver
> Fix For: 0.12.0
>
> Attachments: HIVE-5296.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This error seems to relate to https://issues.apache.org/jira/browse/HIVE-3481
> However, on inspection of the related patch and my built version of Hive 
> (patch carried forward to 0.12.0), I am still seeing the described behaviour.
> Multiple connections to Hiveserver2, all of which are closed and disposed of 
> properly show the Java heap size to grow extremely quickly. 
> This issue can be recreated using the following code
> {code}
> import java.sql.DriverManager;
> import java.sql.Connection;
> import java.sql.ResultSet;
> import java.sql.SQLException;
> import java.sql.Statement;
> import java.util.Properties;
> import org.apache.hive.service.cli.HiveSQLException;
> import org.apache.log4j.Logger;
> /*
>  * Class which encapsulates the lifecycle of a query or statement.
>  * Provides functionality which allows you to create a connection
>  */
> public class HiveClient {
>   
>   Connection con;
>   Logger logger;
>   private static String driverName = "org.apache.hive.jdbc.HiveDriver";   
>   private String db;
>   
>   
>   public HiveClient(String db)
>   {   
>   logger = Logger.getLogger(HiveClient.class);
>   this.db=db;
>   
>   try{
>Class.forName(driverName);
>   }catch(ClassNotFoundException e){
>   logger.info("Can't find Hive driver");
>   }
>   
>   String hiveHost = GlimmerServer.config.getString("hive/host");
>   String hivePort = GlimmerServer.config.getString("hive/port");
>   String connectionString = "jdbc:hive2://"+hiveHost+":"+hivePort 
> +"/default";
>   logger.info(String.format("Attempting to connect to 
> %s",connectionString));
>   try{
>   con = 
> DriverManager.getConnection(connectionString,"","");  
> 
>   }catch(Exception e){
>   logger.error("Problem instantiating the 
> connection"+e.getMessage());
>   }   
>   }
>   
>   public int update(String query) 
>   {
>   Integer res = 0;
>   Statement stmt = null;
>   try{
>   stmt = con.createStatement();
>   String switchdb = "USE "+db;
>   logger.info(switchdb);  
>   stmt.executeUpdate(switchdb);
>   logger.info(query);
>   res = stmt.executeUpdate(query);
>   logger.info("Query passed to server");  
>   stmt.close();
>   }catch(HiveSQLException e){
>   logger.info(String.format("HiveSQLException thrown, 
> this can be valid, " +
>   "but check the error: %s from the query 
> %s",query,e.toString()));
>   }catch(SQLException e){
>   logger.error(String.format("Unable to execute query 
> SQLException %s. Error: %s",query,e));
>   }catch(Exception e){
>   logger.error(String.format("Unable to execute query %s. 
> Error: %s",query,e));
>   }
>   
>   if(stmt!=null)
>   try{
>   stmt.close();
>   }catch(SQLException e){
>   logger.error("Cannot close the statment, 
> potentially memory leak "+e);
>   }
>   
>   return res;
>   }
>   
>   public void close()
>   {
>   if(con!=null){
>   try {
>   con.close();
>   } catch (SQLException e) {  
>   logger.info("Problem closing connection "+e);
>   }
>   }
>   }

[jira] [Updated] (HIVE-4823) implement vectorized TRIM(), LTRIM(), RTRIM()

2013-09-23 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4823:
---

   Resolution: Fixed
Fix Version/s: vectorization-branch
   Status: Resolved  (was: Patch Available)

Committed to branch. Thanks, Eric!

> implement vectorized TRIM(), LTRIM(), RTRIM()
> -
>
> Key: HIVE-4823
> URL: https://issues.apache.org/jira/browse/HIVE-4823
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: vectorization-branch
>Reporter: Eric Hanson
>Assignee: Eric Hanson
> Fix For: vectorization-branch
>
> Attachments: HIVE-4823.1-vectorization.patch, 
> HIVE-4823.2-vectorization.patch
>
>
> Make it work end-to-end, including the vectorized expression, and tying it 
> together in VectorizationContext so a SQL query will run using vectorization 
> when invoking these functions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5296) Memory leak: OOM Error after multiple open/closed JDBC connections.

2013-09-23 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-5296:
-

Status: Patch Available  (was: Open)

> Memory leak: OOM Error after multiple open/closed JDBC connections. 
> 
>
> Key: HIVE-5296
> URL: https://issues.apache.org/jira/browse/HIVE-5296
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
> Environment: Hive 0.12.0, Hadoop 1.1.2, Debian.
>Reporter: Douglas
>  Labels: hiveserver
> Fix For: 0.12.0
>
> Attachments: HIVE-5296.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This error seems to relate to https://issues.apache.org/jira/browse/HIVE-3481
> However, on inspection of the related patch and my built version of Hive 
> (patch carried forward to 0.12.0), I am still seeing the described behaviour.
> Multiple connections to Hiveserver2, all of which are closed and disposed of 
> properly show the Java heap size to grow extremely quickly. 
> This issue can be recreated using the following code
> {code}
> import java.sql.DriverManager;
> import java.sql.Connection;
> import java.sql.ResultSet;
> import java.sql.SQLException;
> import java.sql.Statement;
> import java.util.Properties;
> import org.apache.hive.service.cli.HiveSQLException;
> import org.apache.log4j.Logger;
> /*
>  * Class which encapsulates the lifecycle of a query or statement.
>  * Provides functionality which allows you to create a connection
>  */
> public class HiveClient {
>   
>   Connection con;
>   Logger logger;
>   private static String driverName = "org.apache.hive.jdbc.HiveDriver";   
>   private String db;
>   
>   
>   public HiveClient(String db)
>   {   
>   logger = Logger.getLogger(HiveClient.class);
>   this.db=db;
>   
>   try{
>Class.forName(driverName);
>   }catch(ClassNotFoundException e){
>   logger.info("Can't find Hive driver");
>   }
>   
>   String hiveHost = GlimmerServer.config.getString("hive/host");
>   String hivePort = GlimmerServer.config.getString("hive/port");
>   String connectionString = "jdbc:hive2://"+hiveHost+":"+hivePort 
> +"/default";
>   logger.info(String.format("Attempting to connect to 
> %s",connectionString));
>   try{
>   con = 
> DriverManager.getConnection(connectionString,"","");  
> 
>   }catch(Exception e){
>   logger.error("Problem instantiating the 
> connection"+e.getMessage());
>   }   
>   }
>   
>   public int update(String query) 
>   {
>   Integer res = 0;
>   Statement stmt = null;
>   try{
>   stmt = con.createStatement();
>   String switchdb = "USE "+db;
>   logger.info(switchdb);  
>   stmt.executeUpdate(switchdb);
>   logger.info(query);
>   res = stmt.executeUpdate(query);
>   logger.info("Query passed to server");  
>   stmt.close();
>   }catch(HiveSQLException e){
>   logger.info(String.format("HiveSQLException thrown, 
> this can be valid, " +
>   "but check the error: %s from the query 
> %s",query,e.toString()));
>   }catch(SQLException e){
>   logger.error(String.format("Unable to execute query 
> SQLException %s. Error: %s",query,e));
>   }catch(Exception e){
>   logger.error(String.format("Unable to execute query %s. 
> Error: %s",query,e));
>   }
>   
>   if(stmt!=null)
>   try{
>   stmt.close();
>   }catch(SQLException e){
>   logger.error("Cannot close the statment, 
> potentially memory leak "+e);
>   }
>   
>   return res;
>   }
>   
>   public void close()
>   {
>   if(con!=null){
>   try {
>   con.close();
>   } catch (SQLException e) {  
>   logger.info("Problem closing connection "+e);
>   }
>   }
>   }
>   
>   
>   
> }
> {code}

[jira] [Created] (HIVE-5345) Operator::close() leaks Operator::out, holding reference to buffers

2013-09-23 Thread Gopal V (JIRA)

Gopal V created HIVE-5345:
-

 Summary: Operator::close() leaks Operator::out, holding reference 
to buffers
 Key: HIVE-5345
 URL: https://issues.apache.org/jira/browse/HIVE-5345
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
 Environment: Ubuntu, LXC, jdk6-x86_64
Reporter: Gopal V
Assignee: Gopal V
 Attachments: out-leak.png

When processing multiple splits on the same operator pipeline, the output 
collector in Operator has a held reference, which causes issues.

Operator::close() does not de-reference the OutputCollector object 
Operator::out held by the object.

This means that trying to allocate space for a new OutputCollector causes an 
OOM because the old one is still reachable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5345) Operator::close() leaks Operator::out, holding reference to buffers

2013-09-23 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-5345:
--

Attachment: out-leak.png

hprof analysis

> Operator::close() leaks Operator::out, holding reference to buffers
> ---
>
> Key: HIVE-5345
> URL: https://issues.apache.org/jira/browse/HIVE-5345
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
> Environment: Ubuntu, LXC, jdk6-x86_64
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: memory-leak
> Attachments: out-leak.png
>
>
> When processing multiple splits on the same operator pipeline, the output 
> collector in Operator has a held reference, which causes issues.
> Operator::close() does not de-reference the OutputCollector object 
> Operator::out held by the object.
> This means that trying to allocate space for a new OutputCollector causes an 
> OOM because the old one is still reachable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5345) Operator::close() leaks Operator::out, holding reference to buffers

2013-09-23 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-5345:
--

Attachment: HIVE-5345.01.patch

One liner fix 

{code}
@@ -613,6 +613,8 @@ public void close(boolean abort) throws HiveException {
 op.close(abort);
   }
 
+  out = null;
+
{code}

> Operator::close() leaks Operator::out, holding reference to buffers
> ---
>
> Key: HIVE-5345
> URL: https://issues.apache.org/jira/browse/HIVE-5345
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
> Environment: Ubuntu, LXC, jdk6-x86_64
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: memory-leak
> Attachments: HIVE-5345.01.patch, out-leak.png
>
>
> When processing multiple splits on the same operator pipeline, the output 
> collector in Operator has a held reference, which causes issues.
> Operator::close() does not de-reference the OutputCollector object 
> Operator::out held by the object.
> This means that trying to allocate space for a new OutputCollector causes an 
> OOM because the old one is still reachable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5336) HCatSchema.remove(HCatFieldSchema hcatFieldSchema) should renumber the fieldPositionMap and the fieldPositionMap should not be cached by the end user

2013-09-23 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-5336:


Status: Patch Available  (was: Open)

> HCatSchema.remove(HCatFieldSchema hcatFieldSchema) should renumber the  
> fieldPositionMap and the fieldPositionMap should not be cached by the end user
> --
>
> Key: HIVE-5336
> URL: https://issues.apache.org/jira/browse/HIVE-5336
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5336.1.patch.txt
>
>
> HCatSchema.remove currently does not renumber the fieldPositionMap which can 
> be a problem when there are interleaving append() and remove() calls.
> 1. We should document that fieldPositionMap should not be cached by the 
> end-user
> 2. We should make sure that the fieldPositionMap gets renumbered after 
> remove() because HcatSchema.get will otherwise return wrong FieldSchemas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5336) HCatSchema.remove(HCatFieldSchema hcatFieldSchema) should renumber the fieldPositionMap and the fieldPositionMap should not be cached by the end user

2013-09-23 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-5336:


Attachment: HIVE-5336.1.patch.txt

Added code change to re-align the columns after HCatSchema.remove(). Also added 
a unit test case.

> HCatSchema.remove(HCatFieldSchema hcatFieldSchema) should renumber the  
> fieldPositionMap and the fieldPositionMap should not be cached by the end user
> --
>
> Key: HIVE-5336
> URL: https://issues.apache.org/jira/browse/HIVE-5336
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5336.1.patch.txt
>
>
> HCatSchema.remove currently does not renumber the fieldPositionMap which can 
> be a problem when there are interleaving append() and remove() calls.
> 1. We should document that fieldPositionMap should not be cached by the 
> end-user
> 2. We should make sure that the fieldPositionMap gets renumbered after 
> remove() because HcatSchema.get will otherwise return wrong FieldSchemas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5329) Date and timestamp type converts invalid strings to '1970-01-01'

2013-09-23 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5329:
-

Attachment: HIVE-5329.2.patch

Attaching patch v2.
- Changes per Ashutosh's comments. I actually ended up adding VOID_GROUP as 
part of PrimitiveGrouping, rather than relying on UNKNOWN_GROUP.
- partition_date.q had a test failure with patch v1.  Looking at the cause, 
this test case apparently never worked properly - casting a date partition 
column to timestamp. I've removed that cast from the test.

> Date and timestamp type converts invalid strings to '1970-01-01'
> 
>
> Key: HIVE-5329
> URL: https://issues.apache.org/jira/browse/HIVE-5329
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Affects Versions: 0.12.0
>Reporter: Vikram Dixit K
>Assignee: Jason Dere
>Priority: Blocker
> Attachments: HIVE-5329.1.patch, HIVE-5329.2.patch
>
>
> {noformat}
> select
>   cast('abcd' as date),
>   cast('abcd' as timestamp)
> from src limit 1;
> {noformat}
> returns '1970-01-01'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4914) filtering via partition name should be done inside metastore server (implementation)

2013-09-23 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-4914:
---

Attachment: HIVE-4914.06.patch

> filtering via partition name should be done inside metastore server 
> (implementation)
> 
>
> Key: HIVE-4914
> URL: https://issues.apache.org/jira/browse/HIVE-4914
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: D12561.5.patch, D12561.6.patch, HIVE-4914.01.patch, 
> HIVE-4914.02.patch, HIVE-4914.03.patch, HIVE-4914.04.patch, 
> HIVE-4914.05.patch, HIVE-4914.06.patch, HIVE-4914.D12561.1.patch, 
> HIVE-4914.D12561.2.patch, HIVE-4914.D12561.3.patch, HIVE-4914.D12561.4.patch, 
> HIVE-4914.D12645.1.patch, HIVE-4914-only-no-gen.patch, HIVE-4914-only.patch, 
> HIVE-4914.patch, HIVE-4914.patch, HIVE-4914.patch
>
>
> Currently, if the filter pushdown is impossible (which is most cases), the 
> client gets all partition names from metastore, filters them, and asks for 
> partitions by names for the filtered set.
> Metastore server code should do that instead; it should check if pushdown is 
> possible and do it if so; otherwise it should do name-based filtering.
> Saves the roundtrip with all partition names from the server to client, and 
> also removes the need to have pushdown viability checking on both sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4914) filtering via partition name should be done inside metastore server (implementation)

2013-09-23 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4914:
--

Attachment: D12561.6.patch

sershe updated the revision "HIVE-4914 [jira] filtering via partition name 
should be done inside metastore server (implementation)".

  Remove changes to deserialize that expose some unrelated existing bug

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12561

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12561?vs=40233&id=40323#toc

MANIPHEST TASKS
  https://reviews.facebook.net/T63

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  metastore/if/hive_metastore.thrift
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
  
metastore/src/java/org/apache/hadoop/hive/metastore/PartitionExpressionProxy.java
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
  ql/src/test/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java

To: JIRA, ashutoshc, sershe


> filtering via partition name should be done inside metastore server 
> (implementation)
> 
>
> Key: HIVE-4914
> URL: https://issues.apache.org/jira/browse/HIVE-4914
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: D12561.5.patch, D12561.6.patch, HIVE-4914.01.patch, 
> HIVE-4914.02.patch, HIVE-4914.03.patch, HIVE-4914.04.patch, 
> HIVE-4914.05.patch, HIVE-4914.D12561.1.patch, HIVE-4914.D12561.2.patch, 
> HIVE-4914.D12561.3.patch, HIVE-4914.D12561.4.patch, HIVE-4914.D12645.1.patch, 
> HIVE-4914-only-no-gen.patch, HIVE-4914-only.patch, HIVE-4914.patch, 
> HIVE-4914.patch, HIVE-4914.patch
>
>
> Currently, if the filter pushdown is impossible (which is most cases), the 
> client gets all partition names from metastore, filters them, and asks for 
> partitions by names for the filtered set.
> Metastore server code should do that instead; it should check if pushdown is 
> possible and do it if so; otherwise it should do name-based filtering.
> Saves the roundtrip with all partition names from the server to client, and 
> also removes the need to have pushdown viability checking on both sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4518) Counter Strike: Operation Operator

2013-09-23 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774777#comment-13774777
 ] 

Edward Capriolo commented on HIVE-4518:
---

My mistake I lost track of this one. [~gunther] and [~navis] since you are both 
working on this code maybe you can review and commit. Otherwise if either of 
your are busy I will get back on top of this review.

> Counter Strike: Operation Operator
> --
>
> Key: HIVE-4518
> URL: https://issues.apache.org/jira/browse/HIVE-4518
> Project: Hive
>  Issue Type: Improvement
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-4518.1.patch, HIVE-4518.2.patch, HIVE-4518.3.patch, 
> HIVE-4518.4.patch, HIVE-4518.5.patch
>
>
> Queries of the form:
> from foo
> insert overwrite table bar partition (p) select ...
> insert overwrite table bar partition (p) select ...
> insert overwrite table bar partition (p) select ...
> Generate a huge amount of counters. The reason is that task.progress is 
> turned on for dynamic partitioning queries.
> The counters not only make queries slower than necessary (up to 50%) you will 
> also eventually run out. That's because we're wrapping them in enum values to 
> comply with hadoop 0.17.
> The real reason we turn task.progress on is that we need CREATED_FILES and 
> FATAL counters to ensure dynamic partitioning queries don't go haywire.
> The counters have counter-intuitive names like C1 through C1000 and don't 
> seem really useful by themselves.
> With hadoop 20+ you don't need to wrap the counters anymore, each operator 
> can simply create and increment counters. That should simplify the code a lot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4531) [WebHCat] Collecting task logs to hdfs

2013-09-23 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775629#comment-13775629
 ] 

Eugene Koifman commented on HIVE-4531:
--

+1

I think this needs a linked follow up JIRA to make this work with Hadoop 2 (to 
be addressed later).

> [WebHCat] Collecting task logs to hdfs
> --
>
> Key: HIVE-4531
> URL: https://issues.apache.org/jira/browse/HIVE-4531
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, WebHCat
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12.0
>
> Attachments: HIVE-4531-10.patch, HIVE-4531-11.patch, 
> HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, HIVE-4531-4.patch, 
> HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, HIVE-4531-8.patch, 
> HIVE-4531-9.patch, samplestatusdirwithlist.tar.gz
>
>
> It would be nice we collect task logs after job finish. This is similar to 
> what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5346) Perform some form of data validation to data types on insert, if users want to enable it.

2013-09-23 Thread Robert Justice (JIRA)

Robert Justice created HIVE-5346:


 Summary: Perform some form of data validation to data types on 
insert, if users want to enable it.
 Key: HIVE-5346
 URL: https://issues.apache.org/jira/browse/HIVE-5346
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Affects Versions: 0.11.0
Reporter: Robert Justice
Priority: Minor


We understand that Hive is a "schema on read" type implementation and does not 
verify data as it is loaded.   However, it would be nice to have a switch to 
turn on (off by default) that checks the data be inserted satisfies the data 
types defined on the table.   Obviously, this might be heavy handed and a 
performance hit, but could prevent some problems down the road for users that 
make mistakes.

hive> describe test; 
OK 
col1 smallint 
Time taken: 0.11 seconds 
hive> insert into table test select count(*) + 43000 as bob from test; 



hive> select * from test; 
OK 
-32536
-22535

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics

2013-09-23 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-5324:
-

Attachment: HIVE-5324.2.patch.txt

> Extend record writer and ORC reader/writer interfaces to provide statistics
> ---
>
> Key: HIVE-5324
> URL: https://issues.apache.org/jira/browse/HIVE-5324
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile, statistics
> Fix For: 0.13.0
>
> Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt
>
>
> The current implementation for computing statistics (number of rows and raw 
> data size) happens for every single row processed. The processOp() method in 
> FileSinkOperator gets raw data size for each row from the serde and 
> accumulates the size in hashmap while counting the number of rows. This 
> accumulated statistics is then published to metastore. 
> In case of ORC, ORC already stores enough statistics internally which can be 
> made use of when publishing the stats to metastore. This will avoid the 
> duplication of work that is happening in the processOp(). Also getting the 
> statistics directly from ORC is very cheap (can directly read from the file 
> footer).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 14170: HIVE-5301: Add a schema tool for offline metastore schema upgrade

2013-09-23 Thread Prasad Mujumdar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14170/
---

(Updated Sept. 23, 2013, 8:52 p.m.)


Review request for hive, Ashutosh Chauhan, Brock Noland, and Thejas Nair.


Changes
---

- DB parser re-factoring to handle 0.7 to 0.8 MySQL scritps
- Fixed column name update from COMMENT to VERSION_COMMENT, that was missed in 
the original  HIVE-3764 patch for a couple of upgrade scripts


Bugs: HIVE-5301
https://issues.apache.org/jira/browse/HIVE-5301


Repository: hive-git


Description
---

Schema tool to initialize and migrate hive metastore schema
- Extract the metastore connection details from hive configuration
- the target version is extracted from binary and metastore if possible, 
optionally can be specified as argument
- determine the scripts needs to be executed for the initialization or upgrade
- handle DB nested scripts
- execute the required scripts using beeline


Diffs (updated)
-

  beeline/src/java/org/apache/hive/beeline/BeeLine.java 2802f4c 
  beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java c9e24fa 
  beeline/src/java/org/apache/hive/beeline/Commands.java c574cd4 
  beeline/src/java/org/apache/hive/beeline/HiveSchemaHelper.java PRE-CREATION 
  beeline/src/java/org/apache/hive/beeline/HiveSchemaTool.java PRE-CREATION 
  beeline/src/test/org/apache/hive/beeline/src/test/TestSchemaTool.java 
PRE-CREATION 
  bin/ext/schemaTool.sh PRE-CREATION 
  bin/schematool PRE-CREATION 
  build.xml cf75b3d 
  metastore/scripts/upgrade/derby/014-HIVE-3764.derby.sql 4e08fc1 
  metastore/scripts/upgrade/mysql/014-HIVE-3764.mysql.sql 08c73f6 
  metastore/scripts/upgrade/oracle/014-HIVE-3764.oracle.sql 7e8530d 
  metastore/scripts/upgrade/postgres/014-HIVE-3764.postgres.sql a6f1537 

Diff: https://reviews.apache.org/r/14170/diff/


Testing
---

Added unit tests. Manually tested various options using derby and MySQL.


Thanks,

Prasad Mujumdar

[jira] [Commented] (HIVE-5301) Add a schema tool for offline metastore schema upgrade

2013-09-23 Thread Prasad Mujumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775644#comment-13775644
 ] 

Prasad Mujumdar commented on HIVE-5301:
---

[~ashutoshc] RB is updated.

> Add a schema tool for offline metastore schema upgrade
> --
>
> Key: HIVE-5301
> URL: https://issues.apache.org/jira/browse/HIVE-5301
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.11.0
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Fix For: 0.12.0
>
> Attachments: HIVE-5301.1.patch, HIVE-5301.3.patch, 
> HIVE-5301-with-HIVE-3764.0.patch
>
>
> HIVE-3764 is addressing metastore version consistency.
> Besides it would be helpful to add a tool that can leverage this version 
> information to figure out the required set of upgrade scripts, and execute 
> those against the configured metastore. Now that Hive includes Beeline 
> client, it can be used to execute the scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-5181) RetryingRawStore should not retry on logical failures (e.g. from commit)

2013-09-23 Thread Prasad Mujumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar reassigned HIVE-5181:
-

Assignee: Prasad Mujumdar

> RetryingRawStore should not retry on logical failures (e.g. from commit)
> 
>
> Key: HIVE-5181
> URL: https://issues.apache.org/jira/browse/HIVE-5181
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Prasad Mujumdar
>Priority: Minor
>
> RetryingRawStore retries calls. Some method (e.g. drop_table_core in 
> HiveMetaStore) explicitly call openTransaction and commitTransaction on 
> RawStore.
> When the commit call fails due to some real issue, it is retried, and instead 
> of a real cause for failure one gets some bogus exception about transaction 
> open count.
> I doesn't make sense to retry logical errors, especially not from 
> commitTransaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics

2013-09-23 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775656#comment-13775656
 ] 

Prasanth J commented on HIVE-5324:
--

[~ashutoshc]
I fixed the interface to more friendlier name (I don't think the implementation 
will be hard). Also added the FSOp::closeOp() to invoke the new interface. 

| Not sure if there is a usecase for these apis on writer. Is there any?

Writer interfaces will be used to populate SerDeStats object which will be 
returned by the writer implementing StatsProvidingRecordWriter. Apart from 
these I don't see any other use case.

> Extend record writer and ORC reader/writer interfaces to provide statistics
> ---
>
> Key: HIVE-5324
> URL: https://issues.apache.org/jira/browse/HIVE-5324
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile, statistics
> Fix For: 0.13.0
>
> Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt
>
>
> The current implementation for computing statistics (number of rows and raw 
> data size) happens for every single row processed. The processOp() method in 
> FileSinkOperator gets raw data size for each row from the serde and 
> accumulates the size in hashmap while counting the number of rows. This 
> accumulated statistics is then published to metastore. 
> In case of ORC, ORC already stores enough statistics internally which can be 
> made use of when publishing the stats to metastore. This will avoid the 
> duplication of work that is happening in the processOp(). Also getting the 
> statistics directly from ORC is very cheap (can directly read from the file 
> footer).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5345) Operator::close() leaks Operator::out, holding reference to buffers

2013-09-23 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-5345:
--

Release Note: Prevent  OutputCollector leaks from an Operator by clearing 
the Operator::out reference on close()
  Status: Patch Available  (was: Open)

> Operator::close() leaks Operator::out, holding reference to buffers
> ---
>
> Key: HIVE-5345
> URL: https://issues.apache.org/jira/browse/HIVE-5345
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
> Environment: Ubuntu, LXC, jdk6-x86_64
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: memory-leak
> Attachments: HIVE-5345.01.patch, out-leak.png
>
>
> When processing multiple splits on the same operator pipeline, the output 
> collector in Operator has a held reference, which causes issues.
> Operator::close() does not de-reference the OutputCollector object 
> Operator::out held by the object.
> This means that trying to allocate space for a new OutputCollector causes an 
> OOM because the old one is still reachable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5133) webhcat jobs that need to access metastore fails in secure mode

2013-09-23 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774727#comment-13774727
 ] 

Eugene Koifman commented on HIVE-5133:
--

HIVE-5133.3.patch has additional issue:

org.apache.hive.hcatalog.templeton.tool.TempletonControllerJob and 
org.apache.hive.hcatalog.templeton.tool.MSTokenCleanOutputFormat are using 
org.apache.hcatalog.common.HCatUtil;
they should be using org.apache.hive.hcatalog.common.HCatUtil

> webhcat jobs that need to access metastore fails in secure mode
> ---
>
> Key: HIVE-5133
> URL: https://issues.apache.org/jira/browse/HIVE-5133
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.11.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-5133.1.patch, HIVE-5133.1.test.patch, 
> HIVE-5133.2.patch, HIVE-5133.3.patch
>
>
> Webhcat job submission requests result in the pig/hive/mr job being run from 
> a map task that it launches. In secure mode, for the pig/hive/mr job that is 
> run to be authorized to perform actions on metastore, it has to have the 
> delegation tokens from the hive metastore.
> In case of pig/MR job this is needed if hcatalog is being used in the 
> script/job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5336) HCatSchema.remove(HCatFieldSchema hcatFieldSchema) should renumber the fieldPositionMap and the fieldPositionMap should not be cached by the end user

2013-09-23 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-5336:


Component/s: HCatalog

> HCatSchema.remove(HCatFieldSchema hcatFieldSchema) should renumber the  
> fieldPositionMap and the fieldPositionMap should not be cached by the end user
> --
>
> Key: HIVE-5336
> URL: https://issues.apache.org/jira/browse/HIVE-5336
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5336.1.patch.txt
>
>
> HCatSchema.remove currently does not renumber the fieldPositionMap which can 
> be a problem when there are interleaving append() and remove() calls.
> 1. We should document that fieldPositionMap should not be cached by the 
> end-user
> 2. We should make sure that the fieldPositionMap gets renumbered after 
> remove() because HcatSchema.get will otherwise return wrong FieldSchemas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5345) Operator::close() leaks Operator::out, holding reference to buffers

2013-09-23 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775665#comment-13775665
 ] 

Ashutosh Chauhan commented on HIVE-5345:


Fix makes sense. Though, I think having output collector reference in operator 
class is bad design in first place, since they dont belong there, just a bad 
implementation. I have looked this briefly eariler and I think code refactor to 
eliminate collector from operator is not lot of work. But thats a matter for 
another jira.
+1

> Operator::close() leaks Operator::out, holding reference to buffers
> ---
>
> Key: HIVE-5345
> URL: https://issues.apache.org/jira/browse/HIVE-5345
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
> Environment: Ubuntu, LXC, jdk6-x86_64
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: memory-leak
> Attachments: HIVE-5345.01.patch, out-leak.png
>
>
> When processing multiple splits on the same operator pipeline, the output 
> collector in Operator has a held reference, which causes issues.
> Operator::close() does not de-reference the OutputCollector object 
> Operator::out held by the object.
> This means that trying to allocate space for a new OutputCollector causes an 
> OOM because the old one is still reachable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5320) Querying a table with nested struct type over JSON data results in errors

2013-09-23 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774643#comment-13774643
 ] 

Brock Noland commented on HIVE-5320:


Hey guys, for some reason I thought we shipped the json serde. Since we don't 
ship it I think I don't think we should commit a work around.

Therefore I think we should resolve this as won't fix.

> Querying a table with nested struct type over JSON data results in errors
> -
>
> Key: HIVE-5320
> URL: https://issues.apache.org/jira/browse/HIVE-5320
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.9.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-5320.patch
>
>
> Querying a table with nested_struct datatype like
> ==
> create table nest_struct_tbl (col1 string, col2 array a2:array>>>) ROW FORMAT SERDE 
> 'org.openx.data.jsonserde.JsonSerDe'; 
> ==
> over JSON data cause errors including java.lang.IndexOutOfBoundsException or 
> corrupted data. 
> The JsonSerDe used is 
> json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar.
> The cause is that the method:
> public List getStructFieldsDataAsList(Object o) 
> in JsonStructObjectInspector.java 
> returns a list referencing to a static arraylist "values"
> So the local variable 'list' in method serialize of Hive LazySimpleSerDe 
> class is returned with same reference in its recursive calls and its element 
> values are kept on being overwritten in the case STRUCT.
> Solutions:
> 1. Fix in JsonSerDe, and change the field 'values' in 
> java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java
> to instance scope.
> Filed a ticket to JSonSerDe 
> (https://github.com/rcongiu/Hive-JSON-Serde/issues/31)
> 2. Ideally, in the method serialize of class LazySimpleSerDe, we should 
> defensively save a copy of a list resulted from list = 
> soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of 
> JsonStructObjectInspector, so that the recursive calls of serialize can work 
> properly regardless of the extended SerDe implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5336) HCatSchema.remove(HCatFieldSchema hcatFieldSchema) should renumber the fieldPositionMap and the fieldPositionMap should not be cached by the end user

2013-09-23 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775682#comment-13775682
 ] 

Eugene Koifman commented on HIVE-5336:
--

+1

> HCatSchema.remove(HCatFieldSchema hcatFieldSchema) should renumber the  
> fieldPositionMap and the fieldPositionMap should not be cached by the end user
> --
>
> Key: HIVE-5336
> URL: https://issues.apache.org/jira/browse/HIVE-5336
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5336.1.patch.txt
>
>
> HCatSchema.remove currently does not renumber the fieldPositionMap which can 
> be a problem when there are interleaving append() and remove() calls.
> 1. We should document that fieldPositionMap should not be cached by the 
> end-user
> 2. We should make sure that the fieldPositionMap gets renumbered after 
> remove() because HcatSchema.get will otherwise return wrong FieldSchemas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5329) Date and timestamp type converts invalid strings to '1970-01-01'

2013-09-23 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775685#comment-13775685
 ] 

Ashutosh Chauhan commented on HIVE-5329:


+1

> Date and timestamp type converts invalid strings to '1970-01-01'
> 
>
> Key: HIVE-5329
> URL: https://issues.apache.org/jira/browse/HIVE-5329
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Affects Versions: 0.12.0
>Reporter: Vikram Dixit K
>Assignee: Jason Dere
>Priority: Blocker
> Attachments: HIVE-5329.1.patch, HIVE-5329.2.patch
>
>
> {noformat}
> select
>   cast('abcd' as date),
>   cast('abcd' as timestamp)
> from src limit 1;
> {noformat}
> returns '1970-01-01'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

PreCommit Build Down

2013-09-23 Thread Brock Noland

See https://issues.apache.org/jira/browse/INFRA-6781

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns

2013-09-23 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Status: Open  (was: Patch Available)

> Hive does not honor type for partition columns
> --
>
> Key: HIVE-5297
> URL: https://issues.apache.org/jira/browse/HIVE-5297
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.11.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch, 
> HIVE-5297.4.patch, HIVE-5297.5.patch, HIVE-5297.6.patch
>
>
> Hive does not consider the type of the partition column while writing 
> partitions. Consider for example the query:
> {noformat}
> create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
> row format delimited fields terminated by ',';
> alter table tab1 add partition (month='June', day='second');
> {noformat}
> Hive accepts this query. However if you try to select from this table and 
> insert into another expecting schema match, it will insert nulls instead. We 
> should throw an exception on such user error at the time the partition 
> addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 14155: HIVE-5297 Hive does not honor type for partition columns

2013-09-23 Thread Vikram Dixit Kumaraswamy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14155/
---

(Updated Sept. 23, 2013, 9:51 p.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

Refreshed against latest trunk and enabled check by default. Passes my tests.


Bugs: HIVE-5297
https://issues.apache.org/jira/browse/HIVE-5297


Repository: hive-git


Description
---

Hive does not consider the type of the partition column while writing 
partitions. Consider for example the query:

create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
row format delimited fields terminated by ',';
alter table tab1 add partition (month='June', day='second');

Hive accepts this query. However if you try to select from this table and 
insert into another expecting schema match, it will insert nulls instead.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1af68a6 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 393ef57 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java a704462 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java fb79823 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 767f545 
  ql/src/test/queries/clientnegative/illegal_partition_type.q PRE-CREATION 
  ql/src/test/queries/clientnegative/illegal_partition_type2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/partition_type_check.q PRE-CREATION 
  ql/src/test/results/clientnegative/alter_table_add_partition.q.out bd9c148 
  ql/src/test/results/clientnegative/alter_view_failure5.q.out 4edb82c 
  ql/src/test/results/clientnegative/illegal_partition_type.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/illegal_partition_type2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/partition_type_check.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/14155/diff/


Testing
---

Ran all tests.


Thanks,

Vikram Dixit Kumaraswamy

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns

2013-09-23 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Attachment: HIVE-5297.6.patch

Refreshed to latest trunk. Passes my tests. 

> Hive does not honor type for partition columns
> --
>
> Key: HIVE-5297
> URL: https://issues.apache.org/jira/browse/HIVE-5297
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.11.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch, 
> HIVE-5297.4.patch, HIVE-5297.5.patch, HIVE-5297.6.patch
>
>
> Hive does not consider the type of the partition column while writing 
> partitions. Consider for example the query:
> {noformat}
> create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
> row format delimited fields terminated by ',';
> alter table tab1 add partition (month='June', day='second');
> {noformat}
> Hive accepts this query. However if you try to select from this table and 
> insert into another expecting schema match, it will insert nulls instead. We 
> should throw an exception on such user error at the time the partition 
> addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns

2013-09-23 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Status: Patch Available  (was: Open)

> Hive does not honor type for partition columns
> --
>
> Key: HIVE-5297
> URL: https://issues.apache.org/jira/browse/HIVE-5297
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.11.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch, 
> HIVE-5297.4.patch, HIVE-5297.5.patch, HIVE-5297.6.patch
>
>
> Hive does not consider the type of the partition column while writing 
> partitions. Consider for example the query:
> {noformat}
> create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
> row format delimited fields terminated by ',';
> alter table tab1 add partition (month='June', day='second');
> {noformat}
> Hive accepts this query. However if you try to select from this table and 
> insert into another expecting schema match, it will insert nulls instead. We 
> should throw an exception on such user error at the time the partition 
> addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4763) add support for thrift over http transport in HS2

2013-09-23 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775699#comment-13775699
 ] 

Vaibhav Gumashta commented on HIVE-4763:


[~thejas] Latest feedback incorporated: https://reviews.facebook.net/D12951

> add support for thrift over http transport in HS2
> -
>
> Key: HIVE-4763
> URL: https://issues.apache.org/jira/browse/HIVE-4763
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Vaibhav Gumashta
> Fix For: 0.12.0
>
> Attachments: HIVE-4763.1.patch, HIVE-4763.2.patch, 
> HIVE-4763.D12855.1.patch
>
>
> Subtask for adding support for http transport mode for thrift api in hive 
> server2.
> Support for the different authentication modes will be part of another 
> subtask.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics

2013-09-23 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775696#comment-13775696
 ] 

Ashutosh Chauhan commented on HIVE-5324:


* I think you need boolean[] of statsFromRecordWriter instead of just a 
boolean, since different writers within FSOp may or may not implement that 
interface.
* Also you need instance of check on outWriter for {{SerDeStats stats = 
((StatsProvidingRecordWriter) outWriter).getStats();}} otherwise this will 
throw ClassCastException for writers not implementing the interface.
* Move FileSinkOperator::RecordWriter in the same file as 
StatsProvidingRecordWriter and call that file FSRecordWriter.

> Extend record writer and ORC reader/writer interfaces to provide statistics
> ---
>
> Key: HIVE-5324
> URL: https://issues.apache.org/jira/browse/HIVE-5324
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile, statistics
> Fix For: 0.13.0
>
> Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt
>
>
> The current implementation for computing statistics (number of rows and raw 
> data size) happens for every single row processed. The processOp() method in 
> FileSinkOperator gets raw data size for each row from the serde and 
> accumulates the size in hashmap while counting the number of rows. This 
> accumulated statistics is then published to metastore. 
> In case of ORC, ORC already stores enough statistics internally which can be 
> made use of when publishing the stats to metastore. This will avoid the 
> duplication of work that is happening in the processOp(). Also getting the 
> statistics directly from ORC is very cheap (can directly read from the file 
> footer).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 14155: HIVE-5297 Hive does not honor type for partition columns

2013-09-23 Thread Vikram Dixit Kumaraswamy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14155/
---

(Updated Sept. 23, 2013, 9:57 p.m.)


Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-5297
https://issues.apache.org/jira/browse/HIVE-5297


Repository: hive-git


Description
---

Hive does not consider the type of the partition column while writing 
partitions. Consider for example the query:

create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
row format delimited fields terminated by ',';
alter table tab1 add partition (month='June', day='second');

Hive accepts this query. However if you try to select from this table and 
insert into another expecting schema match, it will insert nulls instead.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e971644 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 393ef57 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java a704462 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 07b271c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 767f545 
  ql/src/test/queries/clientnegative/illegal_partition_type.q PRE-CREATION 
  ql/src/test/queries/clientnegative/illegal_partition_type2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/partition_type_check.q PRE-CREATION 
  ql/src/test/results/clientnegative/alter_table_add_partition.q.out bd9c148 
  ql/src/test/results/clientnegative/alter_view_failure5.q.out 4edb82c 
  ql/src/test/results/clientnegative/illegal_partition_type.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/illegal_partition_type2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/partition_type_check.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/14155/diff/


Testing
---

Ran all tests.


Thanks,

Vikram Dixit Kumaraswamy

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns

2013-09-23 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Attachment: HIVE-5297.6.patch

> Hive does not honor type for partition columns
> --
>
> Key: HIVE-5297
> URL: https://issues.apache.org/jira/browse/HIVE-5297
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.11.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch, 
> HIVE-5297.4.patch, HIVE-5297.5.patch, HIVE-5297.6.patch
>
>
> Hive does not consider the type of the partition column while writing 
> partitions. Consider for example the query:
> {noformat}
> create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
> row format delimited fields terminated by ',';
> alter table tab1 add partition (month='June', day='second');
> {noformat}
> Hive accepts this query. However if you try to select from this table and 
> insert into another expecting schema match, it will insert nulls instead. We 
> should throw an exception on such user error at the time the partition 
> addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns

2013-09-23 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Attachment: (was: HIVE-5297.6.patch)

> Hive does not honor type for partition columns
> --
>
> Key: HIVE-5297
> URL: https://issues.apache.org/jira/browse/HIVE-5297
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.11.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch, 
> HIVE-5297.4.patch, HIVE-5297.5.patch, HIVE-5297.6.patch
>
>
> Hive does not consider the type of the partition column while writing 
> partitions. Consider for example the query:
> {noformat}
> create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
> row format delimited fields terminated by ',';
> alter table tab1 add partition (month='June', day='second');
> {noformat}
> Hive accepts this query. However if you try to select from this table and 
> insert into another expecting schema match, it will insert nulls instead. We 
> should throw an exception on such user error at the time the partition 
> addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns

2013-09-23 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Status: Open  (was: Patch Available)

> Hive does not honor type for partition columns
> --
>
> Key: HIVE-5297
> URL: https://issues.apache.org/jira/browse/HIVE-5297
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.11.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch, 
> HIVE-5297.4.patch, HIVE-5297.5.patch, HIVE-5297.6.patch
>
>
> Hive does not consider the type of the partition column while writing 
> partitions. Consider for example the query:
> {noformat}
> create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
> row format delimited fields terminated by ',';
> alter table tab1 add partition (month='June', day='second');
> {noformat}
> Hive accepts this query. However if you try to select from this table and 
> insert into another expecting schema match, it will insert nulls instead. We 
> should throw an exception on such user error at the time the partition 
> addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns

2013-09-23 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Status: Patch Available  (was: Open)

> Hive does not honor type for partition columns
> --
>
> Key: HIVE-5297
> URL: https://issues.apache.org/jira/browse/HIVE-5297
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.11.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch, 
> HIVE-5297.4.patch, HIVE-5297.5.patch, HIVE-5297.6.patch
>
>
> Hive does not consider the type of the partition column while writing 
> partitions. Consider for example the query:
> {noformat}
> create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
> row format delimited fields terminated by ',';
> alter table tab1 add partition (month='June', day='second');
> {noformat}
> Hive accepts this query. However if you try to select from this table and 
> insert into another expecting schema match, it will insert nulls instead. We 
> should throw an exception on such user error at the time the partition 
> addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5347) HiveServer2 should be able to go from a STOPPED to an INITED state

2013-09-23 Thread Vaibhav Gumashta (JIRA)

Vaibhav Gumashta created HIVE-5347:
--

 Summary: HiveServer2 should be able to go from a STOPPED to an 
INITED state
 Key: HIVE-5347
 URL: https://issues.apache.org/jira/browse/HIVE-5347
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta


HiveServer2 does not consider going from a STOPPED to an INITED state as a 
valid transition. Currently, to go to an INITED state it must be in NOTINITED.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5336) HCatSchema.remove(HCatFieldSchema hcatFieldSchema) should renumber the fieldPositionMap and the fieldPositionMap should not be cached by the end user

2013-09-23 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775712#comment-13775712
 ] 

Eugene Koifman commented on HIVE-5336:
--

btw, I think you need to name the patch file as HIVE-5336.1.patch  (w/o .txt) 
for test framework to pick it up

> HCatSchema.remove(HCatFieldSchema hcatFieldSchema) should renumber the  
> fieldPositionMap and the fieldPositionMap should not be cached by the end user
> --
>
> Key: HIVE-5336
> URL: https://issues.apache.org/jira/browse/HIVE-5336
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5336.1.patch.txt
>
>
> HCatSchema.remove currently does not renumber the fieldPositionMap which can 
> be a problem when there are interleaving append() and remove() calls.
> 1. We should document that fieldPositionMap should not be cached by the 
> end-user
> 2. We should make sure that the fieldPositionMap gets renumbered after 
> remove() because HcatSchema.get will otherwise return wrong FieldSchemas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5274) HCatalog package renaming backward compatibility follow-up

2013-09-23 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775723#comment-13775723
 ] 

Sushanth Sowmyan commented on HIVE-5274:


Hi Viraj,

The 4 test cases you mention are tests against the hive HBaseStorageHandler, 
and therefore will not be affected by my change to the HBaseHCatStorageHandler 
that I mention. Also, I do intend to move them into org/apache/hive/hcatalog as 
part of this patch. Hope that's okay with you.

> HCatalog package renaming backward compatibility follow-up
> --
>
> Key: HIVE-5274
> URL: https://issues.apache.org/jira/browse/HIVE-5274
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Fix For: 0.12.0
>
>
> As part of HIVE-4869, the hbase storage handler in hcat was moved to 
> org.apache.hive.hcatalog, and then put back to org.apache.hcatalog since it 
> was intended to be deprecated as well.
> However, it imports and uses several org.apache.hive.hcatalog classes. This 
> needs to be changed to use org.apache.hcatalog classes.
> ==
> Note : The above is a complete description of this issue in and of by itself, 
> the following is more details on the backward-compatibility goal I have(not 
> saying that each of these things are violated) : 
> a) People using org.apache.hcatalog packages should continue being able to 
> use that package, and see no difference at compile time or runtime. All code 
> here is considered deprecated, and will be gone by the time hive 0.14 rolls 
> around. Additionally, org.apache.hcatalog should behave as if it were 0.11 
> for all compatibility purposes.
> b) People using org.apache.hive.hcatalog packages should never have an 
> org.apache.hcatalog dependency injected in.
> Thus,
> It is okay for org.apache.hcatalog to use org.apache.hive.hcatalog packages 
> internally (say HCatUtil, for example), as long as any interfaces only expose 
> org.apache.hcatalog.\* For tests that test org.apache.hcatalog.\*, we must be 
> capable of testing it from a pure org.apache.hcatalog.\* world.
> It is never okay for org.apache.hive.hcatalog to use org.apache.hcatalog, 
> even in tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5274) HCatalog package renaming backward compatibility follow-up

2013-09-23 Thread Viraj Bhat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775725#comment-13775725
 ] 

Viraj Bhat commented on HIVE-5274:
--

Hi Sushanth, That is fine with me.
Viraj

> HCatalog package renaming backward compatibility follow-up
> --
>
> Key: HIVE-5274
> URL: https://issues.apache.org/jira/browse/HIVE-5274
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Fix For: 0.12.0
>
>
> As part of HIVE-4869, the hbase storage handler in hcat was moved to 
> org.apache.hive.hcatalog, and then put back to org.apache.hcatalog since it 
> was intended to be deprecated as well.
> However, it imports and uses several org.apache.hive.hcatalog classes. This 
> needs to be changed to use org.apache.hcatalog classes.
> ==
> Note : The above is a complete description of this issue in and of by itself, 
> the following is more details on the backward-compatibility goal I have(not 
> saying that each of these things are violated) : 
> a) People using org.apache.hcatalog packages should continue being able to 
> use that package, and see no difference at compile time or runtime. All code 
> here is considered deprecated, and will be gone by the time hive 0.14 rolls 
> around. Additionally, org.apache.hcatalog should behave as if it were 0.11 
> for all compatibility purposes.
> b) People using org.apache.hive.hcatalog packages should never have an 
> org.apache.hcatalog dependency injected in.
> Thus,
> It is okay for org.apache.hcatalog to use org.apache.hive.hcatalog packages 
> internally (say HCatUtil, for example), as long as any interfaces only expose 
> org.apache.hcatalog.\* For tests that test org.apache.hcatalog.\*, we must be 
> capable of testing it from a pure org.apache.hcatalog.\* world.
> It is never okay for org.apache.hive.hcatalog to use org.apache.hcatalog, 
> even in tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5202) Support for SettableUnionObjectInspector and implement isSettable/hasAllFieldsSettable APIs for all data types.

2013-09-23 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775728#comment-13775728
 ] 

Ashutosh Chauhan commented on HIVE-5202:


+1 As a further optimization if both table and partition has same serde than we 
don't even need to check for OI conversions, since its serde from which OI is 
obtained. Lets do this in a follow-up jira.

> Support for SettableUnionObjectInspector and implement 
> isSettable/hasAllFieldsSettable APIs for all data types.
> ---
>
> Key: HIVE-5202
> URL: https://issues.apache.org/jira/browse/HIVE-5202
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5202.2.patch.txt, HIVE-5202.patch
>
>
> These 3 tasks should be accomplished as part of the following jira:
> 1. The current implementation lacks settable union object inspector. We can 
> run into exception inside ObjectInspectorConverters.getConvertedOI() if there 
> is a union.
> 2. Implement the following public functions for all datatypes: 
> isSettable()-> Perform shallow check to see if an object inspector is 
> inherited from settableOI type and 
> hasAllFieldsSettable() -> Perform deep check to see if this objectInspector 
> and all the underlying object inspectors are inherited from settableOI type.
> 3. ObjectInspectorConverters.getConvertedOI() is inefficient. Once (1) and 
> (2) are implemented, add the following check: outputOI.hasAllSettableFields() 
> should be added to return outputOI immediately if the object is entirely 
> settable in order to prevent redundant object instantiation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5296) Memory leak: OOM Error after multiple open/closed JDBC connections.

2013-09-23 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775734#comment-13775734
 ] 

Vaibhav Gumashta commented on HIVE-5296:


[~k.saruta] Thanks for looking at this. Can you post the patch to review board 
or phabricator as well?

> Memory leak: OOM Error after multiple open/closed JDBC connections. 
> 
>
> Key: HIVE-5296
> URL: https://issues.apache.org/jira/browse/HIVE-5296
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
> Environment: Hive 0.12.0, Hadoop 1.1.2, Debian.
>Reporter: Douglas
>  Labels: hiveserver
> Fix For: 0.12.0
>
> Attachments: HIVE-5296.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This error seems to relate to https://issues.apache.org/jira/browse/HIVE-3481
> However, on inspection of the related patch and my built version of Hive 
> (patch carried forward to 0.12.0), I am still seeing the described behaviour.
> Multiple connections to Hiveserver2, all of which are closed and disposed of 
> properly show the Java heap size to grow extremely quickly. 
> This issue can be recreated using the following code
> {code}
> import java.sql.DriverManager;
> import java.sql.Connection;
> import java.sql.ResultSet;
> import java.sql.SQLException;
> import java.sql.Statement;
> import java.util.Properties;
> import org.apache.hive.service.cli.HiveSQLException;
> import org.apache.log4j.Logger;
> /*
>  * Class which encapsulates the lifecycle of a query or statement.
>  * Provides functionality which allows you to create a connection
>  */
> public class HiveClient {
>   
>   Connection con;
>   Logger logger;
>   private static String driverName = "org.apache.hive.jdbc.HiveDriver";   
>   private String db;
>   
>   
>   public HiveClient(String db)
>   {   
>   logger = Logger.getLogger(HiveClient.class);
>   this.db=db;
>   
>   try{
>Class.forName(driverName);
>   }catch(ClassNotFoundException e){
>   logger.info("Can't find Hive driver");
>   }
>   
>   String hiveHost = GlimmerServer.config.getString("hive/host");
>   String hivePort = GlimmerServer.config.getString("hive/port");
>   String connectionString = "jdbc:hive2://"+hiveHost+":"+hivePort 
> +"/default";
>   logger.info(String.format("Attempting to connect to 
> %s",connectionString));
>   try{
>   con = 
> DriverManager.getConnection(connectionString,"","");  
> 
>   }catch(Exception e){
>   logger.error("Problem instantiating the 
> connection"+e.getMessage());
>   }   
>   }
>   
>   public int update(String query) 
>   {
>   Integer res = 0;
>   Statement stmt = null;
>   try{
>   stmt = con.createStatement();
>   String switchdb = "USE "+db;
>   logger.info(switchdb);  
>   stmt.executeUpdate(switchdb);
>   logger.info(query);
>   res = stmt.executeUpdate(query);
>   logger.info("Query passed to server");  
>   stmt.close();
>   }catch(HiveSQLException e){
>   logger.info(String.format("HiveSQLException thrown, 
> this can be valid, " +
>   "but check the error: %s from the query 
> %s",query,e.toString()));
>   }catch(SQLException e){
>   logger.error(String.format("Unable to execute query 
> SQLException %s. Error: %s",query,e));
>   }catch(Exception e){
>   logger.error(String.format("Unable to execute query %s. 
> Error: %s",query,e));
>   }
>   
>   if(stmt!=null)
>   try{
>   stmt.close();
>   }catch(SQLException e){
>   logger.error("Cannot close the statment, 
> potentially memory leak "+e);
>   }
>   
>   return res;
>   }
>   
>   public void close()
>   {
>   if(con!=null){
>   try {
>   con.close();
>   } catch (SQLException e) {  
>   log

[jira] [Updated] (HIVE-4764) support the authentication modes for thrift over http transport for HS2

2013-09-23 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-4764:
---

Assignee: Vaibhav Gumashta

> support the authentication modes for thrift over http transport for HS2
> ---
>
> Key: HIVE-4764
> URL: https://issues.apache.org/jira/browse/HIVE-4764
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Vaibhav Gumashta
> Fix For: 0.12.0
>
>
> This subtask covers support for following functionality for thrift over http 
> transport in hive server2 
> - Support for LDAP,kerberos, custom authorization modes
> - Support for doAs functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4531) [WebHCat] Collecting task logs to hdfs

2013-09-23 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775743#comment-13775743
 ] 

Thejas M Nair commented on HIVE-4531:
-

+1


> [WebHCat] Collecting task logs to hdfs
> --
>
> Key: HIVE-4531
> URL: https://issues.apache.org/jira/browse/HIVE-4531
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, WebHCat
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12.0
>
> Attachments: HIVE-4531-10.patch, HIVE-4531-11.patch, 
> HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, HIVE-4531-4.patch, 
> HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, HIVE-4531-8.patch, 
> HIVE-4531-9.patch, samplestatusdirwithlist.tar.gz
>
>
> It would be nice we collect task logs after job finish. This is similar to 
> what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5348) If both table and partition has same serde than we don't even need to check for OI conversions in ObjectInspectorConverters

2013-09-23 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

Hari Sankar Sivarama Subramaniyan created HIVE-5348:
---

 Summary: If both table and partition has same serde than we don't 
even need to check for OI conversions in ObjectInspectorConverters
 Key: HIVE-5348
 URL: https://issues.apache.org/jira/browse/HIVE-5348
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


Follow-up JIRA for HIVE-5202

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5202) Support for SettableUnionObjectInspector and implement isSettable/hasAllFieldsSettable APIs for all data types.

2013-09-23 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775748#comment-13775748
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-5202:
-

[~ashutoshc] Thanks. Created  HIVE-5348 as a follow-up jira.

> Support for SettableUnionObjectInspector and implement 
> isSettable/hasAllFieldsSettable APIs for all data types.
> ---
>
> Key: HIVE-5202
> URL: https://issues.apache.org/jira/browse/HIVE-5202
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5202.2.patch.txt, HIVE-5202.patch
>
>
> These 3 tasks should be accomplished as part of the following jira:
> 1. The current implementation lacks settable union object inspector. We can 
> run into exception inside ObjectInspectorConverters.getConvertedOI() if there 
> is a union.
> 2. Implement the following public functions for all datatypes: 
> isSettable()-> Perform shallow check to see if an object inspector is 
> inherited from settableOI type and 
> hasAllFieldsSettable() -> Perform deep check to see if this objectInspector 
> and all the underlying object inspectors are inherited from settableOI type.
> 3. ObjectInspectorConverters.getConvertedOI() is inefficient. Once (1) and 
> (2) are implemented, add the following check: outputOI.hasAllSettableFields() 
> should be added to return outputOI immediately if the object is entirely 
> settable in order to prevent redundant object instantiation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE

2013-09-23 Thread Alexander Pivovarov (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775756#comment-13775756
 ] 

Alexander Pivovarov commented on HIVE-4501:
---

Thejas, I'm confused
patch sets fs.hdfs.impl.disable.cache to false
but wiki says set it to true

> HS2 memory leak - FileSystem objects in FileSystem.CACHE
> 
>
> Key: HIVE-4501
> URL: https://issues.apache.org/jira/browse/HIVE-4501
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.11.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-4501.1.patch
>
>
> org.apache.hadoop.fs.FileSystem objects are getting accumulated in 
> FileSystem.CACHE, with HS2 in unsecure mode.
> As a workaround, it is possible to set fs.hdfs.impl.disable.cache and 
> fs.file.impl.disable.cache to false.
> Users should not have to bother with this extra configuration. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5318) Import Throws Error when Importing from a table export Hive 0.9 to Hive 0.10

2013-09-23 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5318:
--

Attachment: HIVE-5318.1.patch

Latest patch added test case.

> Import Throws Error when Importing from a table export Hive 0.9 to Hive 0.10
> 
>
> Key: HIVE-5318
> URL: https://issues.apache.org/jira/browse/HIVE-5318
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 0.9.0, 0.10.0
>Reporter: Brad Ruderman
>Assignee: Xuefu Zhang
>Priority: Critical
> Attachments: HIVE-5318.1.patch, HIVE-5318.patch
>
>
> When Exporting hive tables using the hive command in Hive 0.9 "EXPORT table 
> TO 'hdfs_path'" then importing to another hive 0.10 instance using "IMPORT 
> FROM 'hdfs_path'", hive throws this error:
> 13/09/18 13:14:02 ERROR ql.Driver: FAILED: SemanticException Exception while 
> processing
> org.apache.hadoop.hive.ql.parse.SemanticException: Exception while processing
>   at 
> org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:277)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:349)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:938)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.lang.NullPointerException
>   at java.util.ArrayList.(ArrayList.java:131)
>   at 
> org.apache.hadoop.hive.ql.plan.CreateTableDesc.(CreateTableDesc.java:128)
>   at 
> org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:99)
>   ... 16 more
> 13/09/18 13:14:02 INFO ql.Driver:  start=1379535241411 end=1379535242332 duration=921>
> 13/09/18 13:14:02 INFO ql.Driver: 
> 13/09/18 13:14:02 INFO ql.Driver:  start=1379535242332 end=1379535242332 duration=0>
> 13/09/18 13:14:02 INFO ql.Driver: 
> 13/09/18 13:14:02 INFO ql.Driver:  start=1379535242333 end=1379535242333 duration=0>
> This is probably a critical blocker for people who are trying to test Hive 
> 0.10 in their staging environments prior to the upgrade from 0.9

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5318) Import Throws Error when Importing from a table export Hive 0.9 to Hive 0.10

2013-09-23 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5318:
--

Fix Version/s: 0.13.0
   Status: Patch Available  (was: Open)

> Import Throws Error when Importing from a table export Hive 0.9 to Hive 0.10
> 
>
> Key: HIVE-5318
> URL: https://issues.apache.org/jira/browse/HIVE-5318
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 0.10.0, 0.9.0
>Reporter: Brad Ruderman
>Assignee: Xuefu Zhang
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-5318.1.patch, HIVE-5318.patch
>
>
> When Exporting hive tables using the hive command in Hive 0.9 "EXPORT table 
> TO 'hdfs_path'" then importing to another hive 0.10 instance using "IMPORT 
> FROM 'hdfs_path'", hive throws this error:
> 13/09/18 13:14:02 ERROR ql.Driver: FAILED: SemanticException Exception while 
> processing
> org.apache.hadoop.hive.ql.parse.SemanticException: Exception while processing
>   at 
> org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:277)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:349)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:938)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.lang.NullPointerException
>   at java.util.ArrayList.(ArrayList.java:131)
>   at 
> org.apache.hadoop.hive.ql.plan.CreateTableDesc.(CreateTableDesc.java:128)
>   at 
> org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:99)
>   ... 16 more
> 13/09/18 13:14:02 INFO ql.Driver:  start=1379535241411 end=1379535242332 duration=921>
> 13/09/18 13:14:02 INFO ql.Driver: 
> 13/09/18 13:14:02 INFO ql.Driver:  start=1379535242332 end=1379535242332 duration=0>
> 13/09/18 13:14:02 INFO ql.Driver: 
> 13/09/18 13:14:02 INFO ql.Driver:  start=1379535242333 end=1379535242333 duration=0>
> This is probably a critical blocker for people who are trying to test Hive 
> 0.10 in their staging environments prior to the upgrade from 0.9

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5318) Import Throws Error when Importing from a table export Hive 0.9 to Hive 0.10

2013-09-23 Thread Brad Ruderman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775767#comment-13775767
 ] 

Brad Ruderman commented on HIVE-5318:
-

Sorry I guess that was a bad way to phrase it. What i meant to ask is it 
possible for us to drop an updated jar to fix this issue as part of a 0.9 to 
0.10 upgrade?

Thanks!

> Import Throws Error when Importing from a table export Hive 0.9 to Hive 0.10
> 
>
> Key: HIVE-5318
> URL: https://issues.apache.org/jira/browse/HIVE-5318
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 0.9.0, 0.10.0
>Reporter: Brad Ruderman
>Assignee: Xuefu Zhang
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-5318.1.patch, HIVE-5318.patch
>
>
> When Exporting hive tables using the hive command in Hive 0.9 "EXPORT table 
> TO 'hdfs_path'" then importing to another hive 0.10 instance using "IMPORT 
> FROM 'hdfs_path'", hive throws this error:
> 13/09/18 13:14:02 ERROR ql.Driver: FAILED: SemanticException Exception while 
> processing
> org.apache.hadoop.hive.ql.parse.SemanticException: Exception while processing
>   at 
> org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:277)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:349)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:938)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.lang.NullPointerException
>   at java.util.ArrayList.(ArrayList.java:131)
>   at 
> org.apache.hadoop.hive.ql.plan.CreateTableDesc.(CreateTableDesc.java:128)
>   at 
> org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:99)
>   ... 16 more
> 13/09/18 13:14:02 INFO ql.Driver:  start=1379535241411 end=1379535242332 duration=921>
> 13/09/18 13:14:02 INFO ql.Driver: 
> 13/09/18 13:14:02 INFO ql.Driver:  start=1379535242332 end=1379535242332 duration=0>
> 13/09/18 13:14:02 INFO ql.Driver: 
> 13/09/18 13:14:02 INFO ql.Driver:  start=1379535242333 end=1379535242333 duration=0>
> This is probably a critical blocker for people who are trying to test Hive 
> 0.10 in their staging environments prior to the upgrade from 0.9

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5318) Import Throws Error when Importing from a table export Hive 0.9 to Hive 0.10

2013-09-23 Thread Brad Ruderman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775766#comment-13775766
 ] 

Brad Ruderman commented on HIVE-5318:
-

So this won't be available until hive 0.13?

Thanks!

> Import Throws Error when Importing from a table export Hive 0.9 to Hive 0.10
> 
>
> Key: HIVE-5318
> URL: https://issues.apache.org/jira/browse/HIVE-5318
> Project: Hive
>  Issue Type: Bug
>  Components: Import/Export
>Affects Versions: 0.9.0, 0.10.0
>Reporter: Brad Ruderman
>Assignee: Xuefu Zhang
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-5318.1.patch, HIVE-5318.patch
>
>
> When Exporting hive tables using the hive command in Hive 0.9 "EXPORT table 
> TO 'hdfs_path'" then importing to another hive 0.10 instance using "IMPORT 
> FROM 'hdfs_path'", hive throws this error:
> 13/09/18 13:14:02 ERROR ql.Driver: FAILED: SemanticException Exception while 
> processing
> org.apache.hadoop.hive.ql.parse.SemanticException: Exception while processing
>   at 
> org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:277)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:349)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:938)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.lang.NullPointerException
>   at java.util.ArrayList.(ArrayList.java:131)
>   at 
> org.apache.hadoop.hive.ql.plan.CreateTableDesc.(CreateTableDesc.java:128)
>   at 
> org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:99)
>   ... 16 more
> 13/09/18 13:14:02 INFO ql.Driver:  start=1379535241411 end=1379535242332 duration=921>
> 13/09/18 13:14:02 INFO ql.Driver: 
> 13/09/18 13:14:02 INFO ql.Driver:  start=1379535242332 end=1379535242332 duration=0>
> 13/09/18 13:14:02 INFO ql.Driver: 
> 13/09/18 13:14:02 INFO ql.Driver:  start=1379535242333 end=1379535242333 duration=0>
> This is probably a critical blocker for people who are trying to test Hive 
> 0.10 in their staging environments prior to the upgrade from 0.9

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3694) Generate test jars and publish them to Maven

2013-09-23 Thread Konstantin Boudnik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775779#comment-13775779
 ] 

Konstantin Boudnik commented on HIVE-3694:
--

It seems that this ticket has been committed (with a different synopsis) back 
in Dec/2012 (SHA1 614c640b2eecdd2d8dcea67af7a0a57300597aea)

> Generate test jars and publish them to Maven
> 
>
> Key: HIVE-3694
> URL: https://issues.apache.org/jira/browse/HIVE-3694
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Mikhail Bautin
>Priority: Minor
> Attachments: D6843.1.patch, D6843.2.patch, D6843.3.patch, 
> D6843.4.patch
>
>
> It should be possible to generate Hive test jars and publish them to Maven so 
> that other projects that rely on Hive or extend it could reuse its test 
> library.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: PreCommit Build Down

2013-09-23 Thread Brock Noland

This appears to be fixed.

On Mon, Sep 23, 2013 at 4:46 PM, Brock Noland  wrote:
> See https://issues.apache.org/jira/browse/INFRA-6781



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

Review Request 14298: Memory leak when using JDBC connections.

2013-09-23 Thread Kousuke Saruta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14298/
---

Review request for hive.


Bugs: HIVE-5296
https://issues.apache.org/jira/browse/HIVE-5296


Repository: hive-git


Description
---

Hiveserver2 will occur memory leak caused by increasing Hashtable$Entry at 
least 2 situation as follows.

1. When Exceptions are thrown during executing commmand or query, operation 
handle will not release.
2. Hiveserver2 calls FileSystem#get method and never call FileSystem#close or 
FileSystem.closeAll so FileSystem$Cache will continue to increase.

I've modified HiveSessionImpl and HiveStatement not to lose operation handle. 
Operation handle is needed by OperationManager to remove from handleToOpration.
Also, I've modified HiveSessionImpl to close FileSystem object at the end of 
session.


Diffs
-

  jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 478fa57 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
11c96b2 

Diff: https://reviews.apache.org/r/14298/diff/


Testing
---

I confirmed only not increasing Hashtable$Entry by jmap.


Thanks,

Kousuke Saruta

[jira] [Commented] (HIVE-5296) Memory leak: OOM Error after multiple open/closed JDBC connections.

2013-09-23 Thread Kousuke Saruta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775822#comment-13775822
 ] 

Kousuke Saruta commented on HIVE-5296:
--

Hi, [~vgumashta] I've posted my patch to revoew board.

> Memory leak: OOM Error after multiple open/closed JDBC connections. 
> 
>
> Key: HIVE-5296
> URL: https://issues.apache.org/jira/browse/HIVE-5296
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
> Environment: Hive 0.12.0, Hadoop 1.1.2, Debian.
>Reporter: Douglas
>  Labels: hiveserver
> Fix For: 0.12.0
>
> Attachments: HIVE-5296.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This error seems to relate to https://issues.apache.org/jira/browse/HIVE-3481
> However, on inspection of the related patch and my built version of Hive 
> (patch carried forward to 0.12.0), I am still seeing the described behaviour.
> Multiple connections to Hiveserver2, all of which are closed and disposed of 
> properly show the Java heap size to grow extremely quickly. 
> This issue can be recreated using the following code
> {code}
> import java.sql.DriverManager;
> import java.sql.Connection;
> import java.sql.ResultSet;
> import java.sql.SQLException;
> import java.sql.Statement;
> import java.util.Properties;
> import org.apache.hive.service.cli.HiveSQLException;
> import org.apache.log4j.Logger;
> /*
>  * Class which encapsulates the lifecycle of a query or statement.
>  * Provides functionality which allows you to create a connection
>  */
> public class HiveClient {
>   
>   Connection con;
>   Logger logger;
>   private static String driverName = "org.apache.hive.jdbc.HiveDriver";   
>   private String db;
>   
>   
>   public HiveClient(String db)
>   {   
>   logger = Logger.getLogger(HiveClient.class);
>   this.db=db;
>   
>   try{
>Class.forName(driverName);
>   }catch(ClassNotFoundException e){
>   logger.info("Can't find Hive driver");
>   }
>   
>   String hiveHost = GlimmerServer.config.getString("hive/host");
>   String hivePort = GlimmerServer.config.getString("hive/port");
>   String connectionString = "jdbc:hive2://"+hiveHost+":"+hivePort 
> +"/default";
>   logger.info(String.format("Attempting to connect to 
> %s",connectionString));
>   try{
>   con = 
> DriverManager.getConnection(connectionString,"","");  
> 
>   }catch(Exception e){
>   logger.error("Problem instantiating the 
> connection"+e.getMessage());
>   }   
>   }
>   
>   public int update(String query) 
>   {
>   Integer res = 0;
>   Statement stmt = null;
>   try{
>   stmt = con.createStatement();
>   String switchdb = "USE "+db;
>   logger.info(switchdb);  
>   stmt.executeUpdate(switchdb);
>   logger.info(query);
>   res = stmt.executeUpdate(query);
>   logger.info("Query passed to server");  
>   stmt.close();
>   }catch(HiveSQLException e){
>   logger.info(String.format("HiveSQLException thrown, 
> this can be valid, " +
>   "but check the error: %s from the query 
> %s",query,e.toString()));
>   }catch(SQLException e){
>   logger.error(String.format("Unable to execute query 
> SQLException %s. Error: %s",query,e));
>   }catch(Exception e){
>   logger.error(String.format("Unable to execute query %s. 
> Error: %s",query,e));
>   }
>   
>   if(stmt!=null)
>   try{
>   stmt.close();
>   }catch(SQLException e){
>   logger.error("Cannot close the statment, 
> potentially memory leak "+e);
>   }
>   
>   return res;
>   }
>   
>   public void close()
>   {
>   if(con!=null){
>   try {
>   con.close();
>   } catch (SQLException e) {  
>   logger.info("Problem closing connection "+e);
>

[jira] [Commented] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions & is slow

2013-09-23 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775826#comment-13775826
 ] 

Sergey Shelukhin commented on HIVE-4051:


this and the followup patches (HIVE-5158) will take care of pre-map-reduce-job 
slowdown on select * with many partitions, but it's hard to tell whether that's 
the main culprit from just looking at the query.
show table I am not sure, if not it should be easy to extend. 

> Hive's metastore suffers from 1+N queries when querying partitions & is slow
> 
>
> Key: HIVE-4051
> URL: https://issues.apache.org/jira/browse/HIVE-4051
> Project: Hive
>  Issue Type: Bug
>  Components: Clients, Metastore
> Environment: RHEL 6.3 / EC2 C1.XL
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: 0.12.0
>
> Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, 
> HIVE-4051.D11805.3.patch, HIVE-4051.D11805.4.patch, HIVE-4051.D11805.5.patch, 
> HIVE-4051.D11805.6.patch, HIVE-4051.D11805.7.patch, HIVE-4051.D11805.8.patch, 
> HIVE-4051.D11805.9.patch
>
>
> Hive's query client takes a long time to initialize & start planning queries 
> because of delays in creating all the MTable/MPartition objects.
> For a hive db with 1800 partitions, the metastore took 6-7 seconds to 
> initialize - firing approximately 5900 queries to the mysql database.
> Several of those queries fetch exactly one row to create a single object on 
> the client.
> The following 12 queries were repeated for each partition, generating a storm 
> of SQL queries 
> {code}
> 4 Query SELECT 
> `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`
>  FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = 
> `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945
> 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN 
> `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871
> 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 
> AND THIS.`INTEGER_IDX`>=0
> 4 Query SELECT 
> `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS 
> NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND 
> `A0`.`INTEGER_IDX` >= 0 ORDER BY NUCORDER0
> 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` 
> FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = 
> `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871
> 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND 
> THIS.`INTEGER_IDX`>=0
> 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS 
> NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND 
> `A0`.`INTEGER_IDX` >= 0 ORDER BY NUCORDER0
> 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE 
> THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`>=0
> 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
> NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM 
> `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON 
> `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` 
> =4871 AND `A0`.`INTEGER_IDX` >= 0 ORDER BY NUCORDER0
> 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` 
> =4871 AND `STRING_LIST_ID_KID` IS NOT NULL
> 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
> NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN 
> `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = 
> `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871
> 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM 
> `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT 
> (`A0`.`STRING_LIST_ID_KID` IS NULL)
> {code}
> This data is not detached or cached, so this operation is performed during 
> every query plan for the partitions, even in the same hive client.
> The queries are automatically generated by JDO/DataNucleus which makes it 
> nearly impossible to rewrite it into a single denormalized join operation & 
> process it locally.
> Attempts to optimize this with JDO fetch-groups did not bear fruit in 
> improving the query count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5345) Operator::close() leaks Operator::out, holding reference to buffers

2013-09-23 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775832#comment-13775832
 ] 

Hive QA commented on HIVE-5345:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12604653/HIVE-5345.01.patch

{color:green}SUCCESS:{color} +1 3143 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/858/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/858/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

> Operator::close() leaks Operator::out, holding reference to buffers
> ---
>
> Key: HIVE-5345
> URL: https://issues.apache.org/jira/browse/HIVE-5345
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
> Environment: Ubuntu, LXC, jdk6-x86_64
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: memory-leak
> Attachments: HIVE-5345.01.patch, out-leak.png
>
>
> When processing multiple splits on the same operator pipeline, the output 
> collector in Operator has a held reference, which causes issues.
> Operator::close() does not de-reference the OutputCollector object 
> Operator::out held by the object.
> This means that trying to allocate space for a new OutputCollector causes an 
> OOM because the old one is still reachable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4773) Templeton intermittently fail to commit output to file system

2013-09-23 Thread Shuaishuai Nie (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775834#comment-13775834
 ] 

Shuaishuai Nie commented on HIVE-4773:
--

Thank [~ekoifman] and [~hari.s] for the comments.
-Eugene, I don't think this is a problem when watcher first assignes 
stderr/stdout to 'out' and then reassigns 'out' to 'statusdir'. Only the last 
assign of 'out' matters. The fix will ensure stdout/stderr won't be closed when 
calling writer.close() by override the close function if the 'out' is actrually 
point to stdout/stderr when calling writer.close().
-Hari
1. I am not sure why close() should immediately close if flush() does not 
perform the same thing.
As I mentioned in the earlier comment, flush() will not ensure the content of 
stream written to file based on the book "Hadoop the definitive guide". It 
won't write to file if a block is not filled in distribute file system.
2. Inside run() of Watcher why do you need to create a new object using 
PrintWriter writer = new PrintWriter(out);
I didn't change it in my patch. It is in the origin code base. I think it is 
needed by the format of log in the output file.
3. Even if you add CustomFilterOutputStream class, why do you need to add 
flush() inside close(). This looks like you are flushing twice.
This flush() is not necessary here. Just in case this class is used in 
somewhere else and flush may work there.
4. Do you necessarily need to make CustomFilterOutputStream class public. It 
doesnt look like its used elsewhere.
For now it is not used anywhere else, I think it is ok to change it to 
protected.

> Templeton intermittently fail to commit output to file system
> -
>
> Key: HIVE-4773
> URL: https://issues.apache.org/jira/browse/HIVE-4773
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Reporter: Shuaishuai Nie
>Assignee: Shuaishuai Nie
> Attachments: HIVE-4773.1.patch, HIVE-4773.2.patch
>
>
> With ASV as a default FS, we saw instances where output is not fully flushed 
> to storage before the Templeton controller process exits. This results in 
> stdout and stderr being empty even though the job completed successfully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4773) Templeton intermittently fail to commit output to file system

2013-09-23 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775848#comment-13775848
 ] 

Eugene Koifman commented on HIVE-4773:
--

[~shuainie] OK, I misread your code.  You only use CustomFilterOutputStream to 
wrap System.out/System.err but not when 'out' = statusdir.  I get it now, so 
your changes do the same thing as I was suggesting in previous comment.

I would suggest calling this wrapper class NonClosableStream, and making 
close() method in it do nothing.  (also make class private).  I think this will 
make it easier to understand.

> Templeton intermittently fail to commit output to file system
> -
>
> Key: HIVE-4773
> URL: https://issues.apache.org/jira/browse/HIVE-4773
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Reporter: Shuaishuai Nie
>Assignee: Shuaishuai Nie
> Attachments: HIVE-4773.1.patch, HIVE-4773.2.patch
>
>
> With ASV as a default FS, we saw instances where output is not fully flushed 
> to storage before the Templeton controller process exits. This results in 
> stdout and stderr being empty even though the job completed successfully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4773) Templeton intermittently fail to commit output to file system

2013-09-23 Thread Shuaishuai Nie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuaishuai Nie updated HIVE-4773:
-

Attachment: HIVE-4773.3.patch

Thanks [~ekoifman]. Modified the patch based on comment.

> Templeton intermittently fail to commit output to file system
> -
>
> Key: HIVE-4773
> URL: https://issues.apache.org/jira/browse/HIVE-4773
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Reporter: Shuaishuai Nie
>Assignee: Shuaishuai Nie
> Attachments: HIVE-4773.1.patch, HIVE-4773.2.patch, HIVE-4773.3.patch
>
>
> With ASV as a default FS, we saw instances where output is not fully flushed 
> to storage before the Templeton controller process exits. This results in 
> stdout and stderr being empty even though the job completed successfully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5223) explain doesn't show serde used for table

2013-09-23 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5223:
---

Status: Open  (was: Patch Available)

> explain doesn't show serde used for table
> -
>
> Key: HIVE-5223
> URL: https://issues.apache.org/jira/browse/HIVE-5223
> Project: Hive
>  Issue Type: Improvement
>  Components: Diagnosability
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5223.1.patch, HIVE-5223.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc

2013-09-23 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5279:
--

Attachment: D12963.5.patch

navis updated the revision "HIVE-5279 [jira] Kryo cannot instantiate 
GenericUDAFEvaluator in GroupByDesc".

  Fixed tests

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12963

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12963?vs=40299&id=40365#toc

BRANCH
  HIVE-5279

ARCANIST PROJECT
  hive

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/UDAF.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/AggregationDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectList.java
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMkCollectionEvaluator.java
  ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSumList.java
  ql/src/test/queries/clientpositive/udaf_sum_list.q
  ql/src/test/results/clientpositive/udaf_sum_list.q.out
  ql/src/test/results/compiler/plan/groupby1.q.xml
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml
  ql/src/test/results/compiler/plan/groupby5.q.xml

To: JIRA, ashutoshc, navis


> Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
> ---
>
> Key: HIVE-5279
> URL: https://issues.apache.org/jira/browse/HIVE-5279
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Critical
> Attachments: 5279.patch, D12963.1.patch, D12963.2.patch, 
> D12963.3.patch, D12963.4.patch, D12963.5.patch
>
>
> We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how 
> previous serialization mechanism solved this but, kryo complaints that it's 
> not Serializable and fails the query.
> The log below is the example, 
> {noformat}
> java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class 
> cannot be created (missing no-arg constructor): 
> org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
> Serialization trace:
> inputOI 
> (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval)
> genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc)
> aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc)
> conf (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
>   at org.apache.h
> {noformat}
> If this cannot be fixed in somehow, some UDAFs should be modified to be run 
> on hive-0.13.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5223) explain doesn't show serde used for table

2013-09-23 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5223:
---

Status: Patch Available  (was: Open)

> explain doesn't show serde used for table
> -
>
> Key: HIVE-5223
> URL: https://issues.apache.org/jira/browse/HIVE-5223
> Project: Hive
>  Issue Type: Improvement
>  Components: Diagnosability
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5223.1.patch, HIVE-5223.2.patch, HIVE-5223.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5223) explain doesn't show serde used for table

2013-09-23 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5223:
---

Attachment: HIVE-5223.2.patch

Re-upload patch for Hive QA to kick in.

> explain doesn't show serde used for table
> -
>
> Key: HIVE-5223
> URL: https://issues.apache.org/jira/browse/HIVE-5223
> Project: Hive
>  Issue Type: Improvement
>  Components: Diagnosability
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5223.1.patch, HIVE-5223.2.patch, HIVE-5223.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4773) Templeton intermittently fail to commit output to file system

2013-09-23 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775884#comment-13775884
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-4773:
-

+1

> Templeton intermittently fail to commit output to file system
> -
>
> Key: HIVE-4773
> URL: https://issues.apache.org/jira/browse/HIVE-4773
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Reporter: Shuaishuai Nie
>Assignee: Shuaishuai Nie
> Attachments: HIVE-4773.1.patch, HIVE-4773.2.patch, HIVE-4773.3.patch
>
>
> With ASV as a default FS, we saw instances where output is not fully flushed 
> to storage before the Templeton controller process exits. This results in 
> stdout and stderr being empty even though the job completed successfully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5342) Remove pre hadoop-0.20.0 related codes

2013-09-23 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775892#comment-13775892
 ] 

Navis commented on HIVE-5342:
-

Removing counter in Operator is the first step. There would be following works 
related to this, something like,

1. replace serde2.io.*Writable with hadoop.io.*Writable
2. remove shims for comparing Text
3. possibly removing ByteArrayRef in Lazy
4. etc.

I'll check the HIVE-4518, but it seemed a little intrusive at first look.

> Remove pre hadoop-0.20.0 related codes
> --
>
> Key: HIVE-5342
> URL: https://issues.apache.org/jira/browse/HIVE-5342
> Project: Hive
>  Issue Type: Task
>  Components: Shims
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: D13047.1.patch
>
>
> Recently, we discussed not supporting hadoop-0.20.0. If it would be done like 
> that or not, 0.17 related codes would be removed before that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4963) Support in memory PTF partitions

2013-09-23 Thread Harish Butani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775896#comment-13775896
 ] 

Harish Butani commented on HIVE-4963:
-

Sorry forgot to respond.
Original plan was to have the user give a hint on whether partitions fits in 
memory. This would aid in reducing serialization/deserialization cost when 
partitions fit in memory. But based on discussions with Ashutosh, we decided to 
move to using RowContainers for holding rows in a Partition; this way we share 
the same code as Joins; get the functionality and performance benefits of using 
RowContainers. PTFPartitions are now controlled by ConfVars.HIVEJOINCACHESIZE; 
use of ConfVars.HIVE_PTF_PARTITION_PERSISTENT_SIZE has been removed.

> Support in memory PTF partitions
> 
>
> Key: HIVE-4963
> URL: https://issues.apache.org/jira/browse/HIVE-4963
> Project: Hive
>  Issue Type: New Feature
>  Components: PTF-Windowing
>Reporter: Harish Butani
>Assignee: Harish Butani
> Fix For: 0.12.0
>
> Attachments: HIVE-4963.D11955.1.patch, HIVE-4963.D12279.1.patch, 
> HIVE-4963.D12279.2.patch, HIVE-4963.D12279.3.patch, PTFRowContainer.patch
>
>
> PTF partitions apply the defensive mode of assuming that partitions will not 
> fit in memory. Because of this there is a significant deserialization 
> overhead when accessing elements. 
> Allow the user to specify that there is enough memory to hold partitions 
> through a 'hive.ptf.partition.fits.in.mem' option.  
> Savings depends on partition size and in case of windowing the number of 
> UDAFs and the window ranges. For eg for the following (admittedly extreme) 
> case the PTFOperator exec times went from 39 secs to 8 secs.
>  
> {noformat}
> select t, s, i, b, f, d,
> min(t) over(partition by 1 rows between unbounded preceding and current row), 
> min(s) over(partition by 1 rows between unbounded preceding and current row), 
> min(i) over(partition by 1 rows between unbounded preceding and current row), 
> min(b) over(partition by 1 rows between unbounded preceding and current row) 
> from over10k
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request 14302: HIVE-5181: RetryingRawStore should not retry on logical failures

2013-09-23 Thread Prasad Mujumdar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14302/
---

Review request for hive.


Bugs: HIVE-5181
https://issues.apache.org/jira/browse/HIVE-5181


Repository: hive-git


Description
---

The being/commit/rollback shouldn't be retried by the RawStore retry logic.


Diffs
-

  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java bf2b5ed 
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java 
e0c354f 

Diff: https://reviews.apache.org/r/14302/diff/


Testing
---

Added new test for the patch


Thanks,

Prasad Mujumdar

[jira] [Updated] (HIVE-5181) RetryingRawStore should not retry on logical failures (e.g. from commit)

2013-09-23 Thread Prasad Mujumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-5181:
--

Attachment: HIVE-5181.1.patch

> RetryingRawStore should not retry on logical failures (e.g. from commit)
> 
>
> Key: HIVE-5181
> URL: https://issues.apache.org/jira/browse/HIVE-5181
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Prasad Mujumdar
>Priority: Minor
> Attachments: HIVE-5181.1.patch
>
>
> RetryingRawStore retries calls. Some method (e.g. drop_table_core in 
> HiveMetaStore) explicitly call openTransaction and commitTransaction on 
> RawStore.
> When the commit call fails due to some real issue, it is retried, and instead 
> of a real cause for failure one gets some bogus exception about transaction 
> open count.
> I doesn't make sense to retry logical errors, especially not from 
> commitTransaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 >

1 - 100 of 143 matches

Mail list logo