date:20130925

Chun Chen created HIVE-5357:
---

 Summary: ReduceSinkDeDuplication optimizer pick the wrong keys in 
pRS-cGBYm-cRS-cGBYr scenario when there are distinct keys in child GBY
 Key: HIVE-5357
 URL: https://issues.apache.org/jira/browse/HIVE-5357
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 0.12.0


{code}
select key, count(distinct value) from (select key, value from src group by 
key, value) t group by key;

//result
0 0 NULL
10  10  NULL
100 100 NULL
103 103 NULL
104 104 NULL
{code}

Obviously the result is wrong.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5357) ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr scenario when there are distinct keys in child GBY


 [ 
https://issues.apache.org/jira/browse/HIVE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chen updated HIVE-5357:


Attachment: HIVE-5357.patch

 ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr 
 scenario when there are distinct keys in child GBY
 ---

 Key: HIVE-5357
 URL: https://issues.apache.org/jira/browse/HIVE-5357
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 0.12.0

 Attachments: HIVE-5357.patch


 {code}
 select key, count(distinct value) from (select key, value from src group by 
 key, value) t group by key;
 //result
 0 0 NULL
 10  10  NULL
 100 100 NULL
 103 103 NULL
 104 104 NULL
 {code}
 Obviously the result is wrong.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5357) ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr scenario when there are distinct keys in child GBY


 [ 
https://issues.apache.org/jira/browse/HIVE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chen updated HIVE-5357:


Release Note: ReduceSinkDeDuplication optimizer pick the wrong keys in 
pRS-cGBYm-cRS-cGBYr scenario when there are distinct keys in child GBY
  Status: Patch Available  (was: Open)

 ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr 
 scenario when there are distinct keys in child GBY
 ---

 Key: HIVE-5357
 URL: https://issues.apache.org/jira/browse/HIVE-5357
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 0.12.0

 Attachments: HIVE-5357.patch


 {code}
 select key, count(distinct value) from (select key, value from src group by 
 key, value) t group by key;
 //result
 0 0 NULL
 10  10  NULL
 100 100 NULL
 103 103 NULL
 104 104 NULL
 {code}
 Obviously the result is wrong.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5357) ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr scenario when there are distinct keys in child GBY


[ 
https://issues.apache.org/jira/browse/HIVE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777203#comment-13777203
 ] 

Thejas M Nair commented on HIVE-5357:
-

[~ashutoshc] [~hagleitn] [~navis] Can one of you please review this patch ? 
This is something I would like to include in 0.12 .


 ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr 
 scenario when there are distinct keys in child GBY
 ---

 Key: HIVE-5357
 URL: https://issues.apache.org/jira/browse/HIVE-5357
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 0.12.0

 Attachments: HIVE-5357.patch


 {code}
 select key, count(distinct value) from (select key, value from src group by 
 key, value) t group by key;
 //result
 0 0 NULL
 10  10  NULL
 100 100 NULL
 103 103 NULL
 104 104 NULL
 {code}
 Obviously the result is wrong.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc

2013-09-25 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-5279.


   Resolution: Fixed
Fix Version/s: 0.13.0

Committed to trunk. Thanks, Navis!

 Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
 ---

 Key: HIVE-5279
 URL: https://issues.apache.org/jira/browse/HIVE-5279
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Critical
 Fix For: 0.13.0

 Attachments: 5279.patch, D12963.1.patch, D12963.2.patch, 
 D12963.3.patch, D12963.4.patch, D12963.5.patch


 We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how 
 previous serialization mechanism solved this but, kryo complaints that it's 
 not Serializable and fails the query.
 The log below is the example, 
 {noformat}
 java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class 
 cannot be created (missing no-arg constructor): 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
 Serialization trace:
 inputOI 
 (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval)
 genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc)
 aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc)
 conf (org.apache.hadoop.hive.ql.exec.GroupByOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
   at org.apache.h
 {noformat}
 If this cannot be fixed in somehow, some UDAFs should be modified to be run 
 on hive-0.13.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5341) Link doesn't work. Needs to be updated as mentioned in the Description


 [ 
https://issues.apache.org/jira/browse/HIVE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5341:


Priority: Major  (was: Blocker)

 Link doesn't work. Needs to be updated as mentioned in the Description
 --

 Key: HIVE-5341
 URL: https://issues.apache.org/jira/browse/HIVE-5341
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Reporter: Rakesh Chouhan
Assignee: Lefty Leverenz
  Labels: documentation

 Go to.. Apache HIVE Getting Started Documentation
 https://cwiki.apache.org/confluence/display/Hive/GettingStarted
 Under Section ...
 Simple Example Use Cases
 MovieLens User Ratings
 wget http://www.grouplens.org/system/files/ml-data.tar+0.gz
 The link mentioned as per the document does not work. It needs to be updated 
 to the below URL.
 http://www.grouplens.org/sites/www.grouplens.org/external_files/data/ml-data.tar.gz
 I am setting this defect's priority as a Blocker because, user's will not be 
 able to continue their hands on exercises, unless they find the correct URL 
 to download the mentioned file.
 Referenced from:
 http://mail-archives.apache.org/mod_mbox/hive-user/201302.mbox/%3c8a0c145b-4db9-4d26-8613-8ca1bd741...@daum.net%3E.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-5341) Link doesn't work. Needs to be updated as mentioned in the Description

[
https://issues.apache.org/jira/browse/HIVE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thejas M Nair resolved HIVE-5341.
-

Resolution: Fixed

Thanks Lefty for fixing the doc.
[~chouhan], please feel free to edit the wiki directly. If you don't have write
access, you can create an account on the wiki and send a request to hive-dev
mailing list.

Link doesn't work. Needs to be updated as mentioned in the Description
--

Key: HIVE-5341
URL: https://issues.apache.org/jira/browse/HIVE-5341
Project: Hive
Issue Type: Bug
Components: Documentation
Reporter: Rakesh Chouhan
Assignee: Lefty Leverenz
Labels: documentation

Go to.. Apache HIVE Getting Started Documentation
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Under Section ...
Simple Example Use Cases
MovieLens User Ratings
wget http://www.grouplens.org/system/files/ml-data.tar+0.gz
The link mentioned as per the document does not work. It needs to be updated
to the below URL.
http://www.grouplens.org/sites/www.grouplens.org/external_files/data/ml-data.tar.gz
I am setting this defect's priority as a Blocker because, user's will not be
able to continue their hands on exercises, unless they find the correct URL
to download the mentioned file.
Referenced from:
http://mail-archives.apache.org/mod_mbox/hive-user/201302.mbox/%3c8a0c145b-4db9-4d26-8613-8ca1bd741...@daum.net%3E.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5357) ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr scenario when there are distinct keys in child GBY


[ 
https://issues.apache.org/jira/browse/HIVE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777213#comment-13777213
 ] 

Chun Chen commented on HIVE-5357:
-

review https://reviews.facebook.net/D13089

 ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr 
 scenario when there are distinct keys in child GBY
 ---

 Key: HIVE-5357
 URL: https://issues.apache.org/jira/browse/HIVE-5357
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Chun Chen
Assignee: Chun Chen
 Fix For: 0.12.0

 Attachments: HIVE-5357.patch


 {code}
 select key, count(distinct value) from (select key, value from src group by 
 key, value) t group by key;
 //result
 0 0 NULL
 10  10  NULL
 100 100 NULL
 103 103 NULL
 104 104 NULL
 {code}
 Obviously the result is wrong.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5235) Infinite loop with ORC file and Hive 0.11


[ 
https://issues.apache.org/jira/browse/HIVE-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777215#comment-13777215
 ] 

Thejas M Nair commented on HIVE-5235:
-

Pere, We would like to get this fixed for 0.12 release if possible. Can you 
please give any additional information you have for Owen ?


 Infinite loop with ORC file and Hive 0.11
 -

 Key: HIVE-5235
 URL: https://issues.apache.org/jira/browse/HIVE-5235
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
 Environment: Gentoo linux with Hortonworks Hadoop 
 hadoop-1.1.2.23.tar.gz and Apache Hive 0.11d
Reporter: Iván de Prado
Priority: Blocker

 We are using Hive 0.11 with ORC file format and we get some tasks blocked in 
 some kind of infinite loop. They keep working indefinitely when we set a huge 
 task expiry timeout. If we the expiry time to 600 second, the taks fail 
 because of not reporting progress, and finally, the Job fails. 
 That is not consistent, and some times between jobs executions the behavior 
 changes. It happen for different queries.
 We are using Hive 0.11 with Hadoop hadoop-1.1.2.23 from Hortonworks. The taks 
 that is blocked keeps consuming 100% of CPU usage, and the stack trace is 
 always the same consistently. Everything points to some kind of infinite 
 loop. My guessing is that it has some relation to the ORC file. Maybe some 
 pointer is not right when writing generating some kind of infinite loop when 
 reading.  Or maybe there is a bug in the reading stage.
 More information below. The stack trace:
 {noformat} 
 main prio=10 tid=0x7f20a000a800 nid=0x1ed2 runnable [0x7f20a8136000]
java.lang.Thread.State: RUNNABLE
   at java.util.zip.Inflater.inflateBytes(Native Method)
   at java.util.zip.Inflater.inflate(Inflater.java:256)
   - locked 0xf42a6ca0 (a java.util.zip.ZStreamRef)
   at 
 org.apache.hadoop.hive.ql.io.orc.ZlibCodec.decompress(ZlibCodec.java:64)
   at 
 org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:128)
   at 
 org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:143)
   at 
 org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readVulong(SerializationUtils.java:54)
   at 
 org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readVslong(SerializationUtils.java:65)
   at 
 org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReader.readValues(RunLengthIntegerReader.java:66)
   at 
 org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReader.next(RunLengthIntegerReader.java:81)
   at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:332)
   at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:802)
   at 
 org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1214)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:71)
   at 
 org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:46)
   at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
   at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:300)
   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:218)
   at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236)
   - eliminated 0xe1459700 (a 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader)
   at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
   - locked 0xe1459700 (a 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1178)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {noformat} 
 We have seen the same stack trace repeatedly for several

[jira] [Resolved] (HIVE-4891) Distinct includes duplicate records


 [ 
https://issues.apache.org/jira/browse/HIVE-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved HIVE-4891.
-

Resolution: Cannot Reproduce

This could not be reproduced with more recent hive. Marking it as cannot 
reproduce. 

Fengdong, Please let us know if you feel that there is anything missing in the 
steps followed by Harish, or if you are able to reproduce the issue with hive 
0.12 branch or trunk.


 Distinct includes duplicate records
 ---

 Key: HIVE-4891
 URL: https://issues.apache.org/jira/browse/HIVE-4891
 Project: Hive
  Issue Type: Bug
  Components: File Formats, HiveServer2, Query Processor
Affects Versions: 0.10.0
Reporter: Fengdong Yu
Priority: Blocker
 Fix For: 0.12.0


 I have two partitions, one is sequence file, another is RCFile, but they are 
 the same data(only different file format).
 I have the following SQL:
 {code}
 select distinct uid from test where (dt ='20130718' or dt ='20130718_1') and 
 cur_url like '%cq.aa.com%';
 {code}
 dt ='20130718' is sequence file,(default input format, which specified when 
 create table)
  
 dt ='20130718_1' is RCFile.
 {code}
 ALTER TABLE test ADD IF NOT EXISTS PARTITION (dt='20130718_1') LOCATION 
 '/user/test/test-data'
 ALTER TABLE test PARTITION(dt='20130718_1') SET FILEFORMAT RCFILE;
 {code}
 but there are duplicate recoreds in the result.
 If two partitions with the same input format, then there are no duplicate 
 records.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5301) Add a schema tool for offline metastore schema upgrade


[ 
https://issues.apache.org/jira/browse/HIVE-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777221#comment-13777221
 ] 

Hive QA commented on HIVE-5301:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12604890/HIVE-5301.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 3161 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.listener.TestNotificationListener.testAMQListener
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/879/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/879/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Add a schema tool for offline metastore schema upgrade
 --

 Key: HIVE-5301
 URL: https://issues.apache.org/jira/browse/HIVE-5301
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-5301.1.patch, HIVE-5301.3.patch, HIVE-5301.3.patch, 
 HIVE-5301-with-HIVE-3764.0.patch


 HIVE-3764 is addressing metastore version consistency.
 Besides it would be helpful to add a tool that can leverage this version 
 information to figure out the required set of upgrade scripts, and execute 
 those against the configured metastore. Now that Hive includes Beeline 
 client, it can be used to execute the scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5207) Support data encryption for Hive tables

2013-09-25 Thread Jerry Chen (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jerry Chen updated HIVE-5207:
-

Attachment: HIVE-5207.patch

Attach patch for reference. It depends on hadoop crypto feature.

Support data encryption for Hive tables
---

Key: HIVE-5207
URL: https://issues.apache.org/jira/browse/HIVE-5207
Project: Hive
Issue Type: New Feature
Affects Versions: 0.12.0
Reporter: Jerry Chen
Labels: Rhino
Attachments: HIVE-5207.patch

Original Estimate: 504h
Remaining Estimate: 504h

For sensitive and legally protected data such as personal information, it is
a common practice that the data is stored encrypted in the file system. To
enable Hive with the ability to store and query the encrypted data is very
crucial for Hive data analysis in enterprise.

When creating table, user can specify whether a table is an encrypted table
or not by specify a property in TBLPROPERTIES. Once an encrypted table is
created, query on the encrypted table is transparent as long as the
corresponding key management facilities are set in the running environment of
query. We can use hadoop crypto provided by HADOOP-9331 for underlying data
encryption and decryption.

As to key management, we would support several common key management use
cases. First, the table key (data key) can be stored in the Hive metastore
associated with the table in properties. The table key can be explicit
specified or auto generated and will be encrypted with a master key. There
are cases that the data being processed is generated by other applications,
we need to support externally managed or imported table keys. Also, the data
generated by Hive may be consumed by other applications in the system. We
need to a tool or command for exporting the table key to a java keystore for
using externally.

To handle versions of Hadoop that do not have crypto support, we can avoid
compilation problems by segregating crypto API usage into separate files
(shims) to be included only if a flag is defined on the Ant command line
(something like –Dcrypto=true).

[jira] [Created] (HIVE-5358) ReduceSinkDeDuplication should ignore column orders when check overlapping part of keys between parent and child

Chun Chen created HIVE-5358:
---

 Summary: ReduceSinkDeDuplication should ignore column orders when 
check overlapping part of keys between parent and child
 Key: HIVE-5358
 URL: https://issues.apache.org/jira/browse/HIVE-5358
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Chun Chen
Assignee: Chun Chen


{code}
select key, value from (select key, value from src group by key, value) t group 
by key, value;
{code}
This can be optimized by ReduceSinkDeDuplication

{code}
select key, value from (select key, value from src group by key, value) t group 
by value, key;
{code}
However the sql above can't be optimized by ReduceSinkDeDuplication currently 
due to different column orders of parent and child operator.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5301) Add a schema tool for offline metastore schema upgrade

2013-09-25 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5301:
---

   Resolution: Fixed
Fix Version/s: (was: 0.12.0)
   0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Prasad!

 Add a schema tool for offline metastore schema upgrade
 --

 Key: HIVE-5301
 URL: https://issues.apache.org/jira/browse/HIVE-5301
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.13.0

 Attachments: HIVE-5301.1.patch, HIVE-5301.3.patch, HIVE-5301.3.patch, 
 HIVE-5301-with-HIVE-3764.0.patch


 HIVE-3764 is addressing metastore version consistency.
 Besides it would be helpful to add a tool that can leverage this version 
 information to figure out the required set of upgrade scripts, and execute 
 those against the configured metastore. Now that Hive includes Beeline 
 client, it can be used to execute the scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5283) Merge vectorization branch to trunk


[ 
https://issues.apache.org/jira/browse/HIVE-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777252#comment-13777252
 ] 

Hive QA commented on HIVE-5283:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12604927/HIVE-5283.3.patch

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/884/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/884/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-884/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java'
Reverted 
'service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf build hcatalog/build hcatalog/core/build 
hcatalog/storage-handlers/hbase/build hcatalog/server-extensions/build 
hcatalog/webhcat/svr/build hcatalog/webhcat/java-client/build 
hcatalog/hcatalog-pig-adapter/build common/src/gen
+ svn update
Abeeline/src/test/org/apache/hive/beeline/src/test/TestSchemaTool.java
Ubeeline/src/java/org/apache/hive/beeline/BeeLineOpts.java
Abeeline/src/java/org/apache/hive/beeline/HiveSchemaTool.java
Abeeline/src/java/org/apache/hive/beeline/HiveSchemaHelper.java
Ubeeline/src/java/org/apache/hive/beeline/Commands.java
Ubeeline/src/java/org/apache/hive/beeline/BeeLine.java
Ubuild.xml
Umetastore/scripts/upgrade/derby/014-HIVE-3764.derby.sql
Umetastore/scripts/upgrade/mysql/014-HIVE-3764.mysql.sql
Umetastore/scripts/upgrade/oracle/014-HIVE-3764.oracle.sql
Umetastore/scripts/upgrade/postgres/014-HIVE-3764.postgres.sql
Abin/schematool
Abin/ext/schemaTool.sh

Fetching external item into 'hcatalog/src/test/e2e/harness'
Updated external to revision 1526125.

Updated to revision 1526125.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0 to p2
+ exit 1
'
{noformat}

This message is automatically generated.

 Merge vectorization branch to trunk
 ---

 Key: HIVE-5283
 URL: https://issues.apache.org/jira/browse/HIVE-5283
 Project: Hive
  Issue Type: Bug
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5283.1.patch, HIVE-5283.2.patch, HIVE-5283.3.patch


 The purpose of this jira is to upload vectorization patch, run tests etc. The 
 actual work will continue under HIVE-4160 umbrella jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs


[ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777270#comment-13777270
 ] 

Hive QA commented on HIVE-4629:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12604928/HIVE-4629.1.patch

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/886/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/886/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-886/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToLong.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToByte.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToShort.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf build hcatalog/build hcatalog/core/build 
hcatalog/storage-handlers/hbase/build hcatalog/server-extensions/build 
hcatalog/webhcat/svr/build hcatalog/webhcat/java-client/build 
hcatalog/hcatalog-pig-adapter/build common/src/gen 
ql/src/test/results/clientpositive/cast_to_int.q.out 
ql/src/test/queries/clientpositive/cast_to_int.q
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1526135.

At revision 1526135.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0 to p2
+ exit 1
'
{noformat}

This message is automatically generated.

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-4629.1.patch, HIVE-4629-no_thrift.1.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2843) UDAF to convert an aggregation to a map

2013-09-25 Thread Nepomuk Seiler (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13777277#comment-13777277
 ] 

Nepomuk Seiler commented on HIVE-2843:
--

Hi guys,

what is the status of this?

cheers,
Muki

 UDAF to convert an aggregation to a map
 ---

 Key: HIVE-2843
 URL: https://issues.apache.org/jira/browse/HIVE-2843
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.9.0, 0.10.0
Reporter: David Worms
Priority: Minor
  Labels: features, udf
 Attachments: HIVE-2843.1.patch.txt, HIVE-2843.D8745.1.patch, 
 hive-2843-dev.git.patch


 I propose the addition of two new Hive UDAF to help with maps in Apache Hive. 
 The source code is available on GitHub at https://github.com/wdavidw/hive-udf 
 in two Java classes: UDAFToMap and UDAFToOrderedMap. The first function 
 convert an aggregation into a map and is internally using a Java `HashMap`. 
 The second function extends the first one. It convert an aggregation into an 
 ordered map and is internally using a Java `TreeMap`. They both extends the 
 `AbstractGenericUDAFResolver` class.
 Also, I have covered the motivations and usages of those UDAF in a blog post 
 at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/
 The full patch is available with tests as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4763) add support for thrift over http transport in HS2


 [ 
https://issues.apache.org/jira/browse/HIVE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-4763:
---

Status: Open  (was: Patch Available)

 add support for thrift over http transport in HS2
 -

 Key: HIVE-4763
 URL: https://issues.apache.org/jira/browse/HIVE-4763
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-4763.1.patch, HIVE-4763.2.patch, 
 HIVE-4763.D12855.1.patch, HIVE-4763.D12951.1.patch


 Subtask for adding support for http transport mode for thrift api in hive 
 server2.
 Support for the different authentication modes will be part of another 
 subtask.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4763) add support for thrift over http transport in HS2


 [ 
https://issues.apache.org/jira/browse/HIVE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-4763:
---

Attachment: HIVE-4763.D12951.2.patch

 add support for thrift over http transport in HS2
 -

 Key: HIVE-4763
 URL: https://issues.apache.org/jira/browse/HIVE-4763
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-4763.1.patch, HIVE-4763.2.patch, 
 HIVE-4763.D12855.1.patch, HIVE-4763.D12951.1.patch, HIVE-4763.D12951.2.patch


 Subtask for adding support for http transport mode for thrift api in hive 
 server2.
 Support for the different authentication modes will be part of another 
 subtask.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4763) add support for thrift over http transport in HS2


 [ 
https://issues.apache.org/jira/browse/HIVE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-4763:
---

Status: Patch Available  (was: Open)

 add support for thrift over http transport in HS2
 -

 Key: HIVE-4763
 URL: https://issues.apache.org/jira/browse/HIVE-4763
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-4763.1.patch, HIVE-4763.2.patch, 
 HIVE-4763.D12855.1.patch, HIVE-4763.D12951.1.patch, HIVE-4763.D12951.2.patch


 Subtask for adding support for http transport mode for thrift api in hive 
 server2.
 Support for the different authentication modes will be part of another 
 subtask.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5295) HiveConnection#configureConnection tries to execute statement even after it is closed


 [ 
https://issues.apache.org/jira/browse/HIVE-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-5295:
---

Status: Patch Available  (was: Open)

 HiveConnection#configureConnection tries to execute statement even after it 
 is closed
 -

 Key: HIVE-5295
 URL: https://issues.apache.org/jira/browse/HIVE-5295
 Project: Hive
  Issue Type: Bug
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: D12957.1.patch, D12957.2.patch, D12957.3.patch, 
 HIVE-5295.D12957.3.patch


 HiveConnection#configureConnection tries to execute statement even after it 
 is closed. For remote JDBC client, it tries to set the conf var using 'set 
 foo=bar' by calling HiveStatement.execute for each conf var pair, but closes 
 the statement after the 1st iteration through the conf var pairs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5295) HiveConnection#configureConnection tries to execute statement even after it is closed


 [ 
https://issues.apache.org/jira/browse/HIVE-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-5295:
---

Attachment: HIVE-5295.D12957.3.patch

 HiveConnection#configureConnection tries to execute statement even after it 
 is closed
 -

 Key: HIVE-5295
 URL: https://issues.apache.org/jira/browse/HIVE-5295
 Project: Hive
  Issue Type: Bug
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: D12957.1.patch, D12957.2.patch, D12957.3.patch, 
 HIVE-5295.D12957.3.patch


 HiveConnection#configureConnection tries to execute statement even after it 
 is closed. For remote JDBC client, it tries to set the conf var using 'set 
 foo=bar' by calling HiveStatement.execute for each conf var pair, but closes 
 the statement after the 1st iteration through the conf var pairs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5295) HiveConnection#configureConnection tries to execute statement even after it is closed


 [ 
https://issues.apache.org/jira/browse/HIVE-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-5295:
---

Status: Open  (was: Patch Available)

 HiveConnection#configureConnection tries to execute statement even after it 
 is closed
 -

 Key: HIVE-5295
 URL: https://issues.apache.org/jira/browse/HIVE-5295
 Project: Hive
  Issue Type: Bug
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: D12957.1.patch, D12957.2.patch, D12957.3.patch, 
 HIVE-5295.D12957.3.patch


 HiveConnection#configureConnection tries to execute statement even after it 
 is closed. For remote JDBC client, it tries to set the conf var using 'set 
 foo=bar' by calling HiveStatement.execute for each conf var pair, but closes 
 the statement after the 1st iteration through the conf var pairs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5296) Memory leak: OOM Error after multiple open/closed JDBC connections.

2013-09-25 Thread Kousuke Saruta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-5296:
-

Attachment: HIVE-5296.patch

I've attached a modified patch.

 Memory leak: OOM Error after multiple open/closed JDBC connections. 
 

 Key: HIVE-5296
 URL: https://issues.apache.org/jira/browse/HIVE-5296
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
 Environment: Hive 0.12.0, Hadoop 1.1.2, Debian.
Reporter: Douglas
  Labels: hiveserver
 Fix For: 0.12.0

 Attachments: HIVE-5296.patch, HIVE-5296.patch, HIVE-5296.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 This error seems to relate to https://issues.apache.org/jira/browse/HIVE-3481
 However, on inspection of the related patch and my built version of Hive 
 (patch carried forward to 0.12.0), I am still seeing the described behaviour.
 Multiple connections to Hiveserver2, all of which are closed and disposed of 
 properly show the Java heap size to grow extremely quickly. 
 This issue can be recreated using the following code
 {code}
 import java.sql.DriverManager;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.sql.Statement;
 import java.util.Properties;
 import org.apache.hive.service.cli.HiveSQLException;
 import org.apache.log4j.Logger;
 /*
  * Class which encapsulates the lifecycle of a query or statement.
  * Provides functionality which allows you to create a connection
  */
 public class HiveClient {
   
   Connection con;
   Logger logger;
   private static String driverName = org.apache.hive.jdbc.HiveDriver;   
   private String db;
   
   
   public HiveClient(String db)
   {   
   logger = Logger.getLogger(HiveClient.class);
   this.db=db;
   
   try{
Class.forName(driverName);
   }catch(ClassNotFoundException e){
   logger.info(Can't find Hive driver);
   }
   
   String hiveHost = GlimmerServer.config.getString(hive/host);
   String hivePort = GlimmerServer.config.getString(hive/port);
   String connectionString = jdbc:hive2://+hiveHost+:+hivePort 
 +/default;
   logger.info(String.format(Attempting to connect to 
 %s,connectionString));
   try{
   con = 
 DriverManager.getConnection(connectionString,,);  
 
   }catch(Exception e){
   logger.error(Problem instantiating the 
 connection+e.getMessage());
   }   
   }
   
   public int update(String query) 
   {
   Integer res = 0;
   Statement stmt = null;
   try{
   stmt = con.createStatement();
   String switchdb = USE +db;
   logger.info(switchdb);  
   stmt.executeUpdate(switchdb);
   logger.info(query);
   res = stmt.executeUpdate(query);
   logger.info(Query passed to server);  
   stmt.close();
   }catch(HiveSQLException e){
   logger.info(String.format(HiveSQLException thrown, 
 this can be valid,  +
   but check the error: %s from the query 
 %s,query,e.toString()));
   }catch(SQLException e){
   logger.error(String.format(Unable to execute query 
 SQLException %s. Error: %s,query,e));
   }catch(Exception e){
   logger.error(String.format(Unable to execute query %s. 
 Error: %s,query,e));
   }
   
   if(stmt!=null)
   try{
   stmt.close();
   }catch(SQLException e){
   logger.error(Cannot close the statment, 
 potentially memory leak +e);
   }
   
   return res;
   }
   
   public void close()
   {
   if(con!=null){
   try {
   con.close();
   } catch (SQLException e) {  
   logger.info(Problem closing connection +e);
   }
   }
   }
   
   
   
 }
 {code}
 And by creating and closing many HiveClient objects. The heap space used by 
 the

Re: Review Request 14298: Memory leak when using JDBC connections.

2013-09-25 Thread Kousuke Saruta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14298/
---

(Updated Sept. 25, 2013, 9:09 a.m.)


Review request for hive.


Changes
---

I found using FileSystem.closeAll is a bad idea and FIleSystem$Cache problem 
will be addressed HIVE-4501 so I try to address another problem that opHandle 
will not be released when Exception occurred during executing query or command.


Bugs: HIVE-5296
https://issues.apache.org/jira/browse/HIVE-5296


Repository: hive-git


Description
---

Hiveserver2 will occur memory leak caused by increasing Hashtable$Entry at 
least 2 situation as follows.

1. When Exceptions are thrown during executing commmand or query, operation 
handle will not release.
2. Hiveserver2 calls FileSystem#get method and never call FileSystem#close or 
FileSystem.closeAll so FileSystem$Cache will continue to increase.

I've modified HiveSessionImpl and HiveStatement not to lose operation handle. 
Operation handle is needed by OperationManager to remove from handleToOpration.
Also, I've modified HiveSessionImpl to close FileSystem object at the end of 
session.


Diffs (updated)
-

  jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java 2912ece 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
11c96b2 

Diff: https://reviews.apache.org/r/14298/diff/


Testing
---

I confirmed only not increasing Hashtable$Entry by jmap.


Thanks,

Kousuke Saruta

[jira] [Commented] (HIVE-4822) implement vectorized math functions