date:20150406


[ 
https://issues.apache.org/jira/browse/HIVE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481518#comment-14481518
 ] 

Hive QA commented on HIVE-10134:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12709386/HIVE-10134.3-spark.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8710 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/820/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/820/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-820/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12709386 - PreCommit-HIVE-SPARK-Build

 Fix test failures after HIVE-10130 [Spark Branch]
 -

 Key: HIVE-10134
 URL: https://issues.apache.org/jira/browse/HIVE-10134
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-10134.1-spark.patch, HIVE-10134.2-spark.patch, 
 HIVE-10134.3-spark.patch


 Complete test run: 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/#showFailuresLink
 *Failed tests:*
 {noformat}
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq
 org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10149) Shuffle Hive data before storing in Parquet

2015-04-06 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-10149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-10149:
---
Attachment: hive.log

Here's the log file of the error. It contains different log messages as it was 
run with the latest trunk version.

 Shuffle Hive data before storing in Parquet
 ---

 Key: HIVE-10149
 URL: https://issues.apache.org/jira/browse/HIVE-10149
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Sergio Peña
Assignee: Aihua Xu
 Attachments: data.txt, hive.log


 Hive can run into OOM (Out Of Memory) exceptions when writing many dynamic 
 partitions to parquet because it creates too many open files at once and 
 Parquet buffers an entire row group of data in memory for each open file. To 
 avoid this in ORC, HIVE-6455 shuffles data for each partition so only one 
 file is open at a time. We need to extend this support to Parquet and 
 possibly the MR and Spark planners.
 Steps to reproduce:
 1. Create a table and load some data that contains many many partitions (file 
 {{data.txt}} attached on this ticket).
 {code}
 hive create table t1_stage(id bigint, rdate string) row format delimited 
 fields terminated by ' ';
 hive load data local inpath 'data.txt' into table t1_stage;
 {code}
 2. Create a Parquet table, and insert partitioned data from the t1_stage 
 table.
 {noformat}
 hive set hive.exec.dynamic.partition.mode=nonstrict;
 hive create table t1_part(id bigint) partitioned by (rdate string) stored as 
 parquet;
 hive insert overwrite table t1_part partition(rdate) select * from t1_stage;
 Query ID = sergio_20150330163713_db3afe74-d1c7-4f0d-a8f1-f2137ddb64a4
 Total jobs = 3
 Launching Job 1 out of 3
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_1427748520315_0006, Tracking URL = 
 http://victory:8088/proxy/application_1427748520315_0006/
 Kill Command = /opt/local/hadoop/bin/hadoop job  -kill job_1427748520315_0006
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers:  0
 2015-03-30 16:37:19,065 Stage-1 map = 0%,  reduce = 0%
 2015-03-30 16:37:43,947 Stage-1 map = 100%,  reduce = 0%
 Ended Job = job_1427748520315_0006 with errors
 Error during job, obtaining debugging information...
 Examining task ID: task_1427748520315_0006_m_00 (and more) from job 
 job_1427748520315_0006
 Task with the most failures(4): 
 -
 Task ID:
   task_1427748520315_0006_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1427748520315_0006tipid=task_1427748520315_0006_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask
 MapReduce Jobs Launched: 
 Stage-Stage-1: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
 Total MapReduce CPU Time Spent: 0 msec
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10226) Column stats for Date columns not supported

2015-04-06 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-10226:
--
Attachment: HIVE-10226.1.patch

Re-use the long stats for Date column stats, using the days since epoch value 
as the long value.

 Column stats for Date columns not supported
 ---

 Key: HIVE-10226
 URL: https://issues.apache.org/jira/browse/HIVE-10226
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10226.1.patch


 {noformat}
 hive explain analyze table revenues compute statistics for columns;
 2015-03-30 23:47:45,133 ERROR [main()]: ql.Driver 
 (SessionState.java:printError(951)) - FAILED: UDFArgumentTypeException Only 
 integer/long/timestamp/float/double/string/binary/boolean/decimal type 
 argument is accepted but date is passed.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9073) NPE when using custom windowing UDAFs

2015-04-06 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481894#comment-14481894
 ] 

Ashutosh Chauhan commented on HIVE-9073:


+1 unless above test failures are because of this patch.

A natural extension of this is to move {{Noop}} TableFunctionEvaluator to 
contrib/ module and then do create function for it in tests where it is used. 
Currently, {{Noop}} lives in main src tree because of this bug but used heavily 
in tests. Since contrib/ jar is not on classpath of tests by default, having 
this test-used class live in contrib/ will help in both regards, keeping src 
tree free of any test classes and provide true test case for this 
functionality, since current test uses function which any way is available in 
classpath of tests.
Since above may require some refactoring, I am OK with doing that in a 
follow-up jira.

 NPE when using custom windowing UDAFs
 -

 Key: HIVE-9073
 URL: https://issues.apache.org/jira/browse/HIVE-9073
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-9073.1.patch, HIVE-9073.2.patch, HIVE-9073.2.patch, 
 HIVE-9073.3.patch


 From the hive-user email group:
 {noformat}
 While executing a simple select query using a custom windowing UDAF I created 
 I am constantly running into this error.
  
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 9 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
 ... 14 more
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:647)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.getWindowFunctionInfo(FunctionRegistry.java:1875)
 at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.streamingPossible(WindowingTableFunction.java:150)
 at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.setCanAcceptInputAsStream(WindowingTableFunction.java:221)
 at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.initializeStreaming(WindowingTableFunction.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.initializeStreaming(PTFOperator.java:292)
 at 
 org.apache.hadoop.hive.ql.exec.PTFOperator.initializeOp(PTFOperator.java:86)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
 at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
 ... 14 more
  
 Just wanted to check if any of you have faced this earlier. Also when I try 
 to run the Custom UDAF on another server it works fine. The only difference I 
 can see it that the hive version I am using on my local machine is 0.13.1 
 where it is working and on the other machine it is 0.13.0 where I see the 
 above mentioned error. I am not sure if this was a bug which was fixed in the 
 later release but I just wanted to confirm the same.
 {noformat}



--
This message was sent by

[jira] [Commented] (HIVE-9310) CLI JLine does not flush history back to ~/.hivehistory


[ 
https://issues.apache.org/jira/browse/HIVE-9310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481906#comment-14481906
 ] 

Thejas M Nair commented on HIVE-9310:
-

Created HIVE-10225 for the other cases where the history file is still not 
created.


 CLI JLine does not flush history back to ~/.hivehistory
 ---

 Key: HIVE-9310
 URL: https://issues.apache.org/jira/browse/HIVE-9310
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.15.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 1.1.0

 Attachments: HIVE-9310.1.patch


 Hive CLI does not seem to be saving history anymore.
 In JLine with the PersistentHistory class, to keep history across sessions, 
 you need to do {{reader.getHistory().flush()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10220) Concurrent read access within HybridHashTableContainer

2015-04-06 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481911#comment-14481911
 ] 

Matt McCline commented on HIVE-10220:
-

+1

(I was concerned that MapJoinOperator was going to reload a hash table and 
HybridHashTableContainer wouldn't use the right thread safe object.  But it 
appears it reloads into a MapJoinContainer because it is dealing with 1 
reloaded hash table at a time.)

 Concurrent read access within HybridHashTableContainer 
 ---

 Key: HIVE-10220
 URL: https://issues.apache.org/jira/browse/HIVE-10220
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-10220.1.patch


 HybridHashTableContainer can end up being cached if it does not spill - that 
 needs to follow HIVE-10128 thread safety patterns for the partitioned hash 
 maps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10217) LLAP: Support caching of uncompressed ORC data


[ 
https://issues.apache.org/jira/browse/HIVE-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481605#comment-14481605
 ] 

Sergey Shelukhin commented on HIVE-10217:
-

Why is the jira title such as it is? Uncompressed blocks inside compressed file 
are already supported. Is it a fully uncompressed file?

 LLAP: Support caching of uncompressed ORC data
 --

 Key: HIVE-10217
 URL: https://issues.apache.org/jira/browse/HIVE-10217
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap


 {code}
 Caused by: java.io.IOException: ORC compression buffer size (0) is smaller 
 than LLAP low-level cache minimum allocation size (131072). Decrease the 
 value for hive.llap.io.cache.orc.alloc.min
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:137)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10223) Consolidate several redundant FileSystem API calls.

2015-04-06 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-10223:
-
Assignee: Chris Nauroth

 Consolidate several redundant FileSystem API calls.
 ---

 Key: HIVE-10223
 URL: https://issues.apache.org/jira/browse/HIVE-10223
 Project: Hive
  Issue Type: Improvement
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HIVE-10223.1.patch


 This issue proposes to consolidate several Hive calls to the Hadoop Common 
 {{FileSystem}} API into a fewer number of calls that still accomplish the 
 equivalent work.  {{FileSystem}} API calls typically translate into RPCs to 
 other services like the HDFS NameNode or alternative file system 
 implementations.  Consolidating RPCs will lower latency a bit for Hive code 
 and reduce some load on these external services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8164) Adding in a ReplicationTask that converts a Notification Event to actionable tasks

2015-04-06 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-8164:
---
Attachment: HIVE-8164.2.patch

Updated patch attached.

 Adding in a ReplicationTask that converts a Notification Event to actionable 
 tasks
 --

 Key: HIVE-8164
 URL: https://issues.apache.org/jira/browse/HIVE-8164
 Project: Hive
  Issue Type: Sub-task
  Components: Import/Export
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-8164.2.patch, HIVE-8164.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10224) LLAP: Clean up encoded ORC tree readers after trunk merge

2015-04-06 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-10224.
--
Resolution: Fixed

committed to llap branch

 LLAP: Clean up encoded ORC tree readers after trunk merge
 -

 Key: HIVE-10224
 URL: https://issues.apache.org/jira/browse/HIVE-10224
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10224.1.patch


 Cleanup encoded tree readers after HIVE-10042



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10224) LLAP: Clean up encoded ORC tree readers after trunk merge

2015-04-06 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10224:
-
Attachment: HIVE-10224.1.patch

 LLAP: Clean up encoded ORC tree readers after trunk merge
 -

 Key: HIVE-10224
 URL: https://issues.apache.org/jira/browse/HIVE-10224
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10224.1.patch


 Cleanup encoded tree readers after HIVE-10042



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10214) log metastore call timing information aggregated at query level

2015-04-06 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482150#comment-14482150
 ] 

Vaibhav Gumashta commented on HIVE-10214:
-

Minor comments on rb. Otherwise +1.

 log metastore call timing information aggregated at query level
 ---

 Key: HIVE-10214
 URL: https://issues.apache.org/jira/browse/HIVE-10214
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10214.1.patch


 For troubleshooting issues, it would be useful to log timing information for 
 metastore api calls, aggregated at a query level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session


 [ 
https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10119:
-
Attachment: (was: HIVE-10119.3.patch)

 Allow Log verbosity to be set in hiveserver2 session
 

 Key: HIVE-10119
 URL: https://issues.apache.org/jira/browse/HIVE-10119
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch


 We need to be able to set logging per HS2 session.
 The client often uses the map-reduce completion matrix (Execution) that shows 
 up in Beeline to debug performance. User might not want the verbose log view 
 all the time since it obfuscates the Execution information. Hence the client 
 should be able to change the verbosity level.
 Also, there are 2 levels of verbosity at HS2 logging and not 3. The users 
 might want Execution + Performance counters only - so that level needs to be 
 added.
 So for logs,  the user should be able to set 3 levels of verbosity in the 
 session, that will override the default verbosity specified in the 
 hive-site.xml file.
 0. None - IGNORE
 1. Execution - Just shows the map-reduce tasks completing 
 2. Performance - Execution + Performance counters dumped at the end
 3. Verbose - All logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10216) log hive cli classpath at debug level

2015-04-06 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481932#comment-14481932
 ] 

Vaibhav Gumashta commented on HIVE-10216:
-

+1

 log hive cli classpath at debug level
 -

 Key: HIVE-10216
 URL: https://issues.apache.org/jira/browse/HIVE-10216
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10216.1.patch


 For troubleshooting, it is useful to print the classpath used by hive-cli at 
 DEBUG level in the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables

2015-04-06 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9647:
--
Attachment: HIVE-9647.02.patch

address the metastore configuration flag

 Discrepancy in cardinality estimates between partitioned and un-partitioned 
 tables 
 ---

 Key: HIVE-9647
 URL: https://issues.apache.org/jira/browse/HIVE-9647
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
 Fix For: 1.2.0

 Attachments: HIVE-9647.01.patch, HIVE-9647.02.patch


 High-level summary
 HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number 
 of distinct value to estimate join selectivity.
 The way statistics are aggregated for partitioned tables results in 
 discrepancy in number of distinct values which results in different plans 
 between partitioned and un-partitioned schemas.
 The table below summarizes the NDVs in computeInnerJoinSelectivity which are 
 used to estimate selectivity of joins.
 ||Column  ||Partitioned count distincts|| Un-Partitioned count 
 distincts 
 |sr_customer_sk   |71,245 |1,415,625|
 |sr_item_sk   |38,846|62,562|
 |sr_ticket_number |71,245 |34,931,085|
 |ss_customer_sk   |88,476|1,415,625|
 |ss_item_sk   |38,846|62,562|
 |ss_ticket_number|100,756 |56,256,175|
   
 The discrepancy is because NDV calculation for a partitioned table assumes 
 that the NDV range is contained within each partition and is calculates as 
 select max(NUM_DISTINCTS) from PART_COL_STATS” .
 This is problematic for columns like ticket number which are naturally 
 increasing with the partitioned date column ss_sold_date_sk.
 Suggestions
 Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for 
 HBASE co-porccessors which we can use as a reference here 
 Using the global stats from TAB_COL_STATS and the per partition stats from 
 PART_COL_STATS extrapolate the NDV for the qualified partitions as in :
 Max ( (NUM_DISTINCTS from TAB_COL_STATS) x (Number of qualified partitions) / 
 (Number of Partitions), max(NUM_DISTINCTS) from PART_COL_STATS))
 More details
 While doing TPC-DS Partitioned vs. Un-Partitioned runs I noticed that many of 
 the plans are different, then I dumped the CBO logical plan and I found that 
 join estimates are drastically different
 Unpartitioned schema :
 {code}
 2015-02-10 11:33:27,624 DEBUG [main]: parse.SemanticAnalyzer 
 (SemanticAnalyzer.java:apply(12624)) - Plan After Join Reordering:
 HiveProjectRel(store_sales_quantitycount=[$0], store_sales_quantityave=[$1], 
 store_sales_quantitystdev=[$2], store_sales_quantitycov=[/($2, $1)], 
 as_store_returns_quantitycount=[$3], as_store_returns_quantityave=[$4], 
 as_store_returns_quantitystdev=[$5], store_returns_quantitycov=[/($5, $4)]): 
 rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2956
   HiveAggregateRel(group=[{}], agg#0=[count($0)], agg#1=[avg($0)], 
 agg#2=[stddev_samp($0)], agg#3=[count($1)], agg#4=[avg($1)], 
 agg#5=[stddev_samp($1)]): rowcount = 1.0, cumulative cost = 
 {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2954
 HiveProjectRel($f0=[$4], $f1=[$8]): rowcount = 40.05611776795562, 
 cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2952
   HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$1], 
 ss_customer_sk=[$2], ss_ticket_number=[$3], ss_quantity=[$4], 
 sr_item_sk=[$5], sr_customer_sk=[$6], sr_ticket_number=[$7], 
 sr_return_quantity=[$8], d_date_sk=[$9], d_quarter_name=[$10]): rowcount = 
 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2982
 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 
 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2980
   HiveJoinRel(condition=[AND(AND(=($2, $6), =($1, $5)), =($3, $7))], 
 joinType=[inner]): rowcount = 28880.460910696, cumulative cost = 
 {6.05654559E8 rows, 0.0 cpu, 0.0 io}, id = 2964
 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
 ss_customer_sk=[$3], ss_ticket_number=[$9], ss_quantity=[$10]): rowcount = 
 5.50076554E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2920
   HiveTableScanRel(table=[[tpcds_bin_orc_200.store_sales]]): 
 rowcount = 5.50076554E8, cumulative cost = {0}, id = 2822
 HiveProjectRel(sr_item_sk=[$2], sr_customer_sk=[$3], 
 sr_ticket_number=[$9], sr_return_quantity=[$10]): rowcount = 5.5578005E7, 
 cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2923
   HiveTableScanRel(table=[[tpcds_bin_orc_200.store_returns]]): 
 rowcount = 5.5578005E7, cumulative cost =

[jira] [Commented] (HIVE-10231) Compute partition column stats fails if partition col type is date


[ 
https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482158#comment-14482158
 ] 

Chaoyu Tang commented on HIVE-10231:


[~gopalv] HIVE-10226 is for computing the stats for table columns of date type, 
it is different from this one which is for computing (any) column stats for a 
partition when it has date type in its partition column.

 Compute partition column stats fails if partition col type is date
 --

 Key: HIVE-10231
 URL: https://issues.apache.org/jira/browse/HIVE-10231
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Fix For: 1.2.0

 Attachments: HIVE-10231.patch


 Currently the command analyze table .. partition .. compute statistics for 
 columns may only work for partition column type of string, numeric types, 
 but not others like date. See following case using date as partition coltype:
 {code}
 create table colstatspartdate (key int, value string) partitioned by (ds 
 date, hr int);
 insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select 
 key, value from src limit 20;
 analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute 
 statistics for columns;
 {code}
 you will get RuntimeException:
 {code}
 FAILED: RuntimeException Cannot convert to Date from: int
 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to 
 Date from: int
 java.lang.RuntimeException: Cannot convert to Date from: int
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242)
 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8204) Dynamic partition pruning fails with IndexOutOfBoundsException


 [ 
https://issues.apache.org/jira/browse/HIVE-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8204:
-
Fix Version/s: 1.0.0

 Dynamic partition pruning fails with IndexOutOfBoundsException
 --

 Key: HIVE-8204
 URL: https://issues.apache.org/jira/browse/HIVE-8204
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth Jayachandran
Assignee: Gunther Hagleitner
 Fix For: 1.0.0

 Attachments: HIVE-8204.1.patch, HIVE-8204.2.patch


 Dynamic partition pruning fails with IndexOutOfBounds exception when 
 dimension table is partitioned and fact table is not.
 Steps to reproduce:
 1) Partition date_dim table from tpcds on d_date_sk
 2) Fact table is store_sales which is not partitioned
 3) Run the following
 {code}
 set hive.stats.fetch.column.stats=ture;
 set hive.tez.dynamic.partition.pruning=true;
 explain select d_date 
 from store_sales, date_dim 
 where 
 store_sales.ss_sold_date_sk = date_dim.d_date_sk and 
 date_dim.d_year = 1998;
 {code}
 The stack trace is:
 {code}
 2014-09-19 19:06:16,254 ERROR ql.Driver (SessionState.java:printError(825)) - 
 FAILED: IndexOutOfBoundsException Index: 0, Size: 0
 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.optimizer.RemoveDynamicPruningBySize.process(RemoveDynamicPruningBySize.java:61)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
   at 
 org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:61)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
   at 
 org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsDependentOptimizations(TezCompiler.java:277)
   at 
 org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:120)
   at 
 org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:97)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9781)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
   at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:407)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1060)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1130)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:997)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:987)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10226) Column stats for Date columns not supported

2015-04-06 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482225#comment-14482225
 ] 

Ashutosh Chauhan commented on HIVE-10226:
-

Using long column in metastore table to store max and min date is fine, but 
reusing {{LongColumnStatsData}} in thrift interface to return result not so 
much. I think we should add new thrift structure {DateColumnStatsData}} 
something like :
{code}
struct DateColumnStatsData {
1: optional Date minDate,
2: optional Date maxDate,
3: required i64 numNulls,
4: required i64 numDVs
}
{code}

and than metastore after retrieving # of days from backend, should compute 
Date, populate this struct and return results back to client.

 Column stats for Date columns not supported
 ---

 Key: HIVE-10226
 URL: https://issues.apache.org/jira/browse/HIVE-10226
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10226.1.patch


 {noformat}
 hive explain analyze table revenues compute statistics for columns;
 2015-03-30 23:47:45,133 ERROR [main()]: ql.Driver 
 (SessionState.java:printError(951)) - FAILED: UDFArgumentTypeException Only 
 integer/long/timestamp/float/double/string/binary/boolean/decimal type 
 argument is accepted but date is passed.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10040) CBO (Calcite Return Path): Pluggable cost modules [CBO branch]

2015-04-06 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10040:
---
Attachment: HIVE-10040.01.cbo.patch

[~jpullokkaran], I updated the patch addressing your comments (in RB too). 
Could you take a look? Thanks

 CBO (Calcite Return Path): Pluggable cost modules [CBO branch]
 --

 Key: HIVE-10040
 URL: https://issues.apache.org/jira/browse/HIVE-10040
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: cbo-branch
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: cbo-branch

 Attachments: HIVE-10040.01.cbo.patch, HIVE-10040.cbo.patch


 We should be able to deal with cost models in a modular way. Thus, the cost 
 model should be integrated within a Calcite MD provider that is pluggable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10231) Compute partition column stats fails if partition col type is date


 [ 
https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-10231:
---
Attachment: HIVE-10231.patch

There is a bug in rewriting the analyze query in ColumnStatsSemanticAnalyzer. 
The date value should be quoted in the rewritten query, but was not. 
In this patch, for the simplicity I quote the partition value in the rewritten 
query regardless of its data type. 

 Compute partition column stats fails if partition col type is date
 --

 Key: HIVE-10231
 URL: https://issues.apache.org/jira/browse/HIVE-10231
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Fix For: 1.2.0

 Attachments: HIVE-10231.patch


 Currently the command analyze table .. partition .. compute statistics for 
 columns may only work for partition column type of string, numeric types, 
 but not others like date. See following case using date as partition coltype:
 {code}
 create table colstatspartdate (key int, value string) partitioned by (ds 
 date, hr int);
 insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select 
 key, value from src limit 20;
 analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute 
 statistics for columns;
 {code}
 you will get RuntimeException:
 {code}
 FAILED: RuntimeException Cannot convert to Date from: int
 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to 
 Date from: int
 java.lang.RuntimeException: Cannot convert to Date from: int
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242)
 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10225) CLI JLine does not flush history on quit/Ctrl-C


[ 
https://issues.apache.org/jira/browse/HIVE-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482143#comment-14482143
 ] 

Thejas M Nair commented on HIVE-10225:
--

Verified that history is getting written in case of quit, exit, CTRL-D, CTRL-C


 CLI JLine does not flush history on quit/Ctrl-C
 ---

 Key: HIVE-10225
 URL: https://issues.apache.org/jira/browse/HIVE-10225
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10225.1.patch


 Hive CLI is not saving history, if hive cli is terminated using a Ctrl-C or 
 quit;.
 HIVE-9310 fixed it for the case where one exits with Ctrl-D (EOF), but not 
 for the above ways of exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10118) CBO (Calcite Return Path): Internal error: Cannot find common type for join keys

2015-04-06 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran reassigned HIVE-10118:
-

Assignee: Laljo John Pullokkaran  (was: Jesus Camacho Rodriguez)

 CBO (Calcite Return Path): Internal error: Cannot find common type for join 
 keys 
 -

 Key: HIVE-10118
 URL: https://issues.apache.org/jira/browse/HIVE-10118
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
 Fix For: cbo-branch


 Query 
 {code}
 explain
   select  ss_items.item_id
,ss_item_rev
,ss_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ss_dev
,cs_item_rev
,cs_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 cs_dev
,ws_item_rev
,cs_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 cs_dev
,ws_item_rev
,ws_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ws_dev
,(ss_item_rev+cs_item_rev+ws_item_rev)/3 average
 FROM
 ( select i_item_id item_id ,sum(ss_ext_sales_price) as ss_item_rev
  from store_sales
  JOIN item ON store_sales.ss_item_sk = item.i_item_sk
  JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
  JOIN (select d1.d_date
  from date_dim d1 JOIN date_dim d2 ON d1.d_week_seq = 
 d2.d_week_seq
  where d2.d_date = '1998-08-04') sub ON date_dim.d_date = 
 sub.d_date
  group by i_item_id ) ss_items
 JOIN
 ( select i_item_id item_id ,sum(cs_ext_sales_price) as cs_item_rev
  from catalog_sales
  JOIN item ON catalog_sales.cs_item_sk = item.i_item_sk
  JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
  JOIN (select d1.d_date
  from date_dim d1 JOIN date_dim d2 ON d1.d_week_seq = 
 d2.d_week_seq
  where d2.d_date = '1998-08-04') sub ON date_dim.d_date = 
 sub.d_date
  group by i_item_id ) cs_items
 ON ss_items.item_id=cs_items.item_id
 JOIN
 ( select i_item_id item_id ,sum(ws_ext_sales_price) as ws_item_rev
  from web_sales
  JOIN item ON web_sales.ws_item_sk = item.i_item_sk
  JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
  JOIN (select d1.d_date
  from date_dim d1 JOIN date_dim d2 ON d1.d_week_seq = 
 d2.d_week_seq
  where d2.d_date = '1998-08-04') sub ON date_dim.d_date = 
 sub.d_date
  group by i_item_id ) ws_items
 ON ss_items.item_id=ws_items.item_id
  where
ss_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev
and ss_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev
and cs_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev
and cs_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev
and ws_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev
and ws_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev
  order by item_id ,ss_item_rev
  limit 100
 {code}
 Exception 
 {code}
  limit 100
 15/03/27 12:38:32 [main]: ERROR parse.CalcitePlanner: CBO failed, skipping 
 CBO.
 java.lang.RuntimeException: java.lang.AssertionError: Internal error: Cannot 
 find common type for join keys $1 (type INTEGER) and $1 (type 
 VARCHAR(2147483647))
   at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.rethrowCalciteException(CalcitePlanner.java:677)
   at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:586)
   at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:238)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9998)
   at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:201)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
   at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1114)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1162)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1041)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
   at

[jira] [Commented] (HIVE-10206) Improve Alter Table to not initialize Serde unnecessarily


[ 
https://issues.apache.org/jira/browse/HIVE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482187#comment-14482187
 ] 

Hive QA commented on HIVE-10206:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12709363/HIVE-10206.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8701 tests executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_table_wrong_regex
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3298/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3298/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3298/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12709363 - PreCommit-HIVE-TRUNK-Build

 Improve Alter Table to not initialize Serde unnecessarily
 -

 Key: HIVE-10206
 URL: https://issues.apache.org/jira/browse/HIVE-10206
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 1.1.0
Reporter: Szehon Ho
Assignee: Szehon Ho
Priority: Minor
 Attachments: HIVE-10206.patch


 Create an avro table with an external avsc file like:
 {noformat}
 CREATE  TABLE test(...)
 ROW FORMAT SERDE 
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
 STORED AS INPUTFORMAT 
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
 OUTPUTFORMAT 
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES (
   'avro.schema.url'='file:///Users/szehon/Temp/test.avsc', 
   'kite.compression.type'='snappy', 
   'transient_lastDdlTime'='1427996456')
 {noformat}
 Delete test.avsc file.
 Try to modify the table properties:
 {noformat}
 alter table test set tblproperties 
 ('avro.schema.url'='file:///Users/szehon/Temp/test2.avsc');
 {noformat}
 Will throw an exception like AvroSerdeException:
 {noformat}
   at 
 org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:119)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:163)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:101)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:78)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:520)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:377)
 at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:274)
 at 
 org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:256)
 at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:595)
 at 
 org.apache.hadoop.hive.ql.exec.DDLTask.alterTableOrSinglePartition(DDLTask.java:3383)
 at 
 org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3340)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:332)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1037)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

[jira] [Updated] (HIVE-10160) Give a warning when grouping or ordering by a constant column


 [ 
https://issues.apache.org/jira/browse/HIVE-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-10160:

Attachment: HIVE-10160.4.patch

 Give a warning when grouping or ordering by a constant column
 -

 Key: HIVE-10160
 URL: https://issues.apache.org/jira/browse/HIVE-10160
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Lefty Leverenz
Assignee: Yongzhi Chen
Priority: Minor
 Attachments: HIVE-10160.1.patch, HIVE-10160.3.patch, 
 HIVE-10160.4.patch


 To avoid confusion, a warning should be issued when users specify column 
 positions instead of names in a GROUP BY or ORDER BY clause (unless 
 hive.groupby.orderby.position.alias is set to true in Hive 0.11.0 or later).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session

[
https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482239#comment-14482239
]

Thejas M Nair commented on HIVE-10119:
--

The patch looks great. Can you also make one more change ? Change the enum
loggingLevel to LoggingLevel ? (start with capital letter). +1 pending that
change and test run.

Allow Log verbosity to be set in hiveserver2 session

Key: HIVE-10119
URL: https://issues.apache.org/jira/browse/HIVE-10119
Project: Hive
Issue Type: Bug
Components: HiveServer2
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch,
HIVE-10119.3.patch

We need to be able to set logging per HS2 session.
The client often uses the map-reduce completion matrix (Execution) that shows
up in Beeline to debug performance. User might not want the verbose log view
all the time since it obfuscates the Execution information. Hence the client
should be able to change the verbosity level.
Also, there are 2 levels of verbosity at HS2 logging and not 3. The users
might want Execution + Performance counters only - so that level needs to be
added.
So for logs, the user should be able to set 3 levels of verbosity in the
session, that will override the default verbosity specified in the
hive-site.xml file.
0. None - IGNORE
1. Execution - Just shows the map-reduce tasks completing
2. Performance - Execution + Performance counters dumped at the end
3. Verbose - All logs

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics

2015-04-06 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan reassigned HIVE-10228:
---

Assignee: Sushanth Sowmyan

 Changes to Hive Export/Import/DropTable/DropPartition to support replication 
 semantics
 --

 Key: HIVE-10228
 URL: https://issues.apache.org/jira/browse/HIVE-10228
 Project: Hive
  Issue Type: Sub-task
  Components: Import/Export
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10131) LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs

2015-04-06 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481951#comment-14481951
 ] 

Sergey Shelukhin commented on HIVE-10131:
-

Left some comments on RB (https://reviews.apache.org/r/32851/)

 LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs
 

 Key: HIVE-10131
 URL: https://issues.apache.org/jira/browse/HIVE-10131
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Matt McCline
 Attachments: HIVE-10131.01.patch, HIVE-10131.02.patch, 
 HIVE-10131.03.patch, HIVE-10131.04.patch, HIVE-10131.05.patch


 Refs are always allocated and cleared. Should be reused.
 Iterator is another option



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9709) Hive should support replaying cookie from JDBC driver for beeline


 [ 
https://issues.apache.org/jira/browse/HIVE-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-9709:

Attachment: HIVE-9709.3.patch

Making some minor changes from the preivious patch. Looking for a clean run.

Thanks
Hari

 Hive should support replaying cookie from JDBC driver for beeline
 -

 Key: HIVE-9709
 URL: https://issues.apache.org/jira/browse/HIVE-9709
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-9709.1.patch, HIVE-9709.2.patch, HIVE-9709.3.patch


 Consider the following scenario:
 Beeline  Knox  HS2.
 Where Knox is going to LDAP for authentication. To avoid re-authentication, 
 Knox supports using a Cookie to identity a request. However the Beeline JDBC 
 client does not send back the cookie Knox sent and this leads to Knox having 
 to re-create LDAP authentication request on every connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10230) Vectorization: Look at performance of VectorExpressionWriterFactory and its use of DateWritable.daysToMillis


 [ 
https://issues.apache.org/jira/browse/HIVE-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10230:
---
Priority: Minor  (was: Critical)

 Vectorization: Look at performance of VectorExpressionWriterFactory and its 
 use of DateWritable.daysToMillis
 

 Key: HIVE-10230
 URL: https://issues.apache.org/jira/browse/HIVE-10230
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Minor
 Attachments: datewritable-serialization.png


 [~gopalv] found that DateWritable code in VectorExpressionWriterFactory 
 showing up in hot-code paths from vectorization.  Probably slowness where it 
 calls ZoneInfo::getOffset.
 !datewritable-serialization.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10231) Compute partition column stats fails if partition col type is date


[ 
https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482113#comment-14482113
 ] 

Gopal V commented on HIVE-10231:


Also, with {{set hive.optimize.constant.propagation=false;}}

 Compute partition column stats fails if partition col type is date
 --

 Key: HIVE-10231
 URL: https://issues.apache.org/jira/browse/HIVE-10231
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Fix For: 1.2.0


 Currently the command analyze table .. partition .. compute statistics for 
 columns may only work for partition column type of string, numeric types, 
 but not others like date. See following case using date as partition coltype:
 {code}
 create table colstatspartdate (key int, value string) partitioned by (ds 
 date, hr int);
 insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select 
 key, value from src limit 20;
 analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute 
 statistics for columns;
 {code}
 you will get RuntimeException:
 {code}
 FAILED: RuntimeException Cannot convert to Date from: int
 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to 
 Date from: int
 java.lang.RuntimeException: Cannot convert to Date from: int
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242)
 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10231) Compute partition column stats fails if partition col type is date


[ 
https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482178#comment-14482178
 ] 

Chaoyu Tang commented on HIVE-10231:


I have requested the patch review on RB https://reviews.apache.org/r/32908/ , 
thanks in advance.

 Compute partition column stats fails if partition col type is date
 --

 Key: HIVE-10231
 URL: https://issues.apache.org/jira/browse/HIVE-10231
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Fix For: 1.2.0

 Attachments: HIVE-10231.patch


 Currently the command analyze table .. partition .. compute statistics for 
 columns may only work for partition column type of string, numeric types, 
 but not others like date. See following case using date as partition coltype:
 {code}
 create table colstatspartdate (key int, value string) partitioned by (ds 
 date, hr int);
 insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select 
 key, value from src limit 20;
 analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute 
 statistics for columns;
 {code}
 you will get RuntimeException:
 {code}
 FAILED: RuntimeException Cannot convert to Date from: int
 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to 
 Date from: int
 java.lang.RuntimeException: Cannot convert to Date from: int
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242)
 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10160) Give a warning when grouping or ordering by a constant column


[ 
https://issues.apache.org/jira/browse/HIVE-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482190#comment-14482190
 ] 

Yongzhi Chen commented on HIVE-10160:
-

Put the warning to shell, and add one test, waiting for test result. 

 Give a warning when grouping or ordering by a constant column
 -

 Key: HIVE-10160
 URL: https://issues.apache.org/jira/browse/HIVE-10160
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Lefty Leverenz
Assignee: Yongzhi Chen
Priority: Minor
 Attachments: HIVE-10160.1.patch, HIVE-10160.3.patch, 
 HIVE-10160.4.patch


 To avoid confusion, a warning should be issued when users specify column 
 positions instead of names in a GROUP BY or ORDER BY clause (unless 
 hive.groupby.orderby.position.alias is set to true in Hive 0.11.0 or later).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-10131) LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs


[ 
https://issues.apache.org/jira/browse/HIVE-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481951#comment-14481951
 ] 

Sergey Shelukhin edited comment on HIVE-10131 at 4/6/15 9:11 PM:
-

Left some comments on RB 


was (Author: sershe):
Left some comments on RB (https://reviews.apache.org/r/32851/)

 LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs
 

 Key: HIVE-10131
 URL: https://issues.apache.org/jira/browse/HIVE-10131
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Matt McCline
 Attachments: HIVE-10131.01.patch, HIVE-10131.02.patch, 
 HIVE-10131.03.patch, HIVE-10131.04.patch, HIVE-10131.05.patch


 Refs are always allocated and cleared. Should be reused.
 Iterator is another option



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8164) Adding in a ReplicationTask that converts a Notification Event to actionable tasks

2015-04-06 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481952#comment-14481952
 ] 

Sushanth Sowmyan commented on HIVE-8164:


Since the last patch, a few things have changed.

a) Instead of a top-level repl/ folder, the replication lib now sits inside 
webhcat-java-client , which is the home of HCatClient, since that's where this 
interface is now being exposed
b) Iterator/Function semantics instead of List/Map
c) ReplicationTask is now abstract, and a ReplicationTaskFactory is responsible 
for concrete Task initialization. By default, with this patch, there is only 
NoopReplicationTask. A new replication task factory, EXIMReplicationTaskFactory 
based on hive export-import will bring in concrete implementations with 
HIVE-10227
d) Concrete replication support with hive ddl will be added with HIVE-10228 
(HIVE-10227 will depend on HIVE-10228)

 Adding in a ReplicationTask that converts a Notification Event to actionable 
 tasks
 --

 Key: HIVE-8164
 URL: https://issues.apache.org/jira/browse/HIVE-8164
 Project: Hive
  Issue Type: Sub-task
  Components: Import/Export
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-8164.2.patch, HIVE-8164.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10131) LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs

2015-04-06 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-10131:

Attachment: HIVE-10131.06.patch

 LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs
 

 Key: HIVE-10131
 URL: https://issues.apache.org/jira/browse/HIVE-10131
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Matt McCline
 Attachments: HIVE-10131.01.patch, HIVE-10131.02.patch, 
 HIVE-10131.03.patch, HIVE-10131.04.patch, HIVE-10131.05.patch, 
 HIVE-10131.06.patch


 Refs are always allocated and cleared. Should be reused.
 Iterator is another option



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10229) Set conf and processor context in the constructor instead of init


 [ 
https://issues.apache.org/jira/browse/HIVE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-10229:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: HIVE-7926)

 Set conf and processor context in the constructor instead of init
 -

 Key: HIVE-10229
 URL: https://issues.apache.org/jira/browse/HIVE-10229
 Project: Hive
  Issue Type: Bug
 Environment: 
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth

 Hit this on ctas13 query.
 {noformat}
 Error: Failure while running task:java.lang.NullPointerException
at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:98)
at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134)
at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
 {noformat}
 The line is  cacheKey = queryId + processorContext.getTaskVertexName() + 
 REDUCE_PLAN_KEY;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10225) CLI JLine does not flush history on quit/Ctrl-C


 [ 
https://issues.apache.org/jira/browse/HIVE-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-10225:
-
Attachment: HIVE-10225.1.patch

Adds a shutdown hook that calls flush on history.
This is similar to what we do in beeline. I didn't re-use code from there, 
since there has been working going on to make beeline usable as a separate 
module.


 CLI JLine does not flush history on quit/Ctrl-C
 ---

 Key: HIVE-10225
 URL: https://issues.apache.org/jira/browse/HIVE-10225
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10225.1.patch


 Hive CLI is not saving history, if hive cli is terminated using a Ctrl-C or 
 quit;.
 HIVE-9310 fixed it for the case where one exits with Ctrl-D (EOF), but not 
 for the above ways of exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10225) CLI JLine does not flush history on quit/Ctrl-C


[ 
https://issues.apache.org/jira/browse/HIVE-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482141#comment-14482141
 ] 

Thejas M Nair commented on HIVE-10225:
--

[~prasanth_j] Can you please review this ? Let me know if you would like review 
board link for it.


 CLI JLine does not flush history on quit/Ctrl-C
 ---

 Key: HIVE-10225
 URL: https://issues.apache.org/jira/browse/HIVE-10225
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10225.1.patch


 Hive CLI is not saving history, if hive cli is terminated using a Ctrl-C or 
 quit;.
 HIVE-9310 fixed it for the case where one exits with Ctrl-D (EOF), but not 
 for the above ways of exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10229) Set conf and processor context in the constructor instead of init


 [ 
https://issues.apache.org/jira/browse/HIVE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-10229:
--
Attachment: HIVE-10229.1.patch

Fairly simple patch to set jconf and context during construction.

 Set conf and processor context in the constructor instead of init
 -

 Key: HIVE-10229
 URL: https://issues.apache.org/jira/browse/HIVE-10229
 Project: Hive
  Issue Type: Bug
 Environment: 
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth
 Fix For: 1.2.0

 Attachments: HIVE-10229.1.patch


 Hit this on ctas13 query.
 {noformat}
 Error: Failure while running task:java.lang.NullPointerException
at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:98)
at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134)
at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
 {noformat}
 The line is  cacheKey = queryId + processorContext.getTaskVertexName() + 
 REDUCE_PLAN_KEY;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9937) LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join

2015-04-06 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481958#comment-14481958
 ] 

Matt McCline commented on HIVE-9937:


Rebased to recent checkins and submit.

 LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new 
 Vectorized Map Join
 --

 Key: HIVE-9937
 URL: https://issues.apache.org/jira/browse/HIVE-9937
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-9937.01.patch, HIVE-9937.02.patch, 
 HIVE-9937.03.patch, HIVE-9937.04.patch, HIVE-9937.05.patch, 
 HIVE-9937.06.patch, HIVE-9937.07.patch, HIVE-9937.08.patch, 
 HIVE-9937.09.patch, HIVE-9937.91.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10005) remove some unnecessary branches from the inner loop


 [ 
https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-10005:
--
Attachment: HIVE-10005.4.patch

 remove some unnecessary branches from the inner loop
 

 Key: HIVE-10005
 URL: https://issues.apache.org/jira/browse/HIVE-10005
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-10005.1.patch, HIVE-10005.2.patch, 
 HIVE-10005.3.patch, HIVE-10005.4.patch


 Operator.forward is doing too much. There's no reason to do the done 
 checking per row and update it inline. It's much more efficient to just do 
 that when the event that completes an operator happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10229) LLAP: NPE in ReduceRecordProcessor


[ 
https://issues.apache.org/jira/browse/HIVE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482118#comment-14482118
 ] 

Siddharth Seth commented on HIVE-10229:
---

Yep. Same issue I saw. ProcessorContext is null.

I'm going to upload a patch for trunk which sets the conf and context in the 
constructor instead of the init method.

 LLAP: NPE in ReduceRecordProcessor
 --

 Key: HIVE-10229
 URL: https://issues.apache.org/jira/browse/HIVE-10229
 Project: Hive
  Issue Type: Sub-task
 Environment: 
Reporter: Sergey Shelukhin
Assignee: Gunther Hagleitner

 Hit this on ctas13 query.
 {noformat}
 Error: Failure while running task:java.lang.NullPointerException
at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:98)
at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134)
at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
 {noformat}
 The line is  cacheKey = queryId + processorContext.getTaskVertexName() + 
 REDUCE_PLAN_KEY;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10229) Set conf and processor context in the constructor instead of init


 [ 
https://issues.apache.org/jira/browse/HIVE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned HIVE-10229:
-

Assignee: Siddharth Seth  (was: Gunther Hagleitner)

 Set conf and processor context in the constructor instead of init
 -

 Key: HIVE-10229
 URL: https://issues.apache.org/jira/browse/HIVE-10229
 Project: Hive
  Issue Type: Sub-task
 Environment: 
Reporter: Sergey Shelukhin
Assignee: Siddharth Seth

 Hit this on ctas13 query.
 {noformat}
 Error: Failure while running task:java.lang.NullPointerException
at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:98)
at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134)
at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
 {noformat}
 The line is  cacheKey = queryId + processorContext.getTaskVertexName() + 
 REDUCE_PLAN_KEY;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10229) Set conf and processor context in the constructor instead of init

2015-04-06 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-10229:
--
Summary: Set conf and processor context in the constructor instead of init  
(was: LLAP: NPE in ReduceRecordProcessor)

 Set conf and processor context in the constructor instead of init
 -

 Key: HIVE-10229
 URL: https://issues.apache.org/jira/browse/HIVE-10229
 Project: Hive
  Issue Type: Sub-task
 Environment: 
Reporter: Sergey Shelukhin
Assignee: Gunther Hagleitner

 Hit this on ctas13 query.
 {noformat}
 Error: Failure while running task:java.lang.NullPointerException
at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:98)
at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134)
at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
 {noformat}
 The line is  cacheKey = queryId + processorContext.getTaskVertexName() + 
 REDUCE_PLAN_KEY;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session


 [ 
https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10119:
-
Attachment: HIVE-10119.3.patch

 Allow Log verbosity to be set in hiveserver2 session
 

 Key: HIVE-10119
 URL: https://issues.apache.org/jira/browse/HIVE-10119
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch, 
 HIVE-10119.3.patch


 We need to be able to set logging per HS2 session.
 The client often uses the map-reduce completion matrix (Execution) that shows 
 up in Beeline to debug performance. User might not want the verbose log view 
 all the time since it obfuscates the Execution information. Hence the client 
 should be able to change the verbosity level.
 Also, there are 2 levels of verbosity at HS2 logging and not 3. The users 
 might want Execution + Performance counters only - so that level needs to be 
 added.
 So for logs,  the user should be able to set 3 levels of verbosity in the 
 session, that will override the default verbosity specified in the 
 hive-site.xml file.
 0. None - IGNORE
 1. Execution - Just shows the map-reduce tasks completing 
 2. Performance - Execution + Performance counters dumped at the end
 3. Verbose - All logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2015-04-06 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7049:
-
Assignee: Daniel Dai  (was: Mohammad Kamrul Islam)

 Unable to deserialize AVRO data when file schema and record schema are 
 different and nullable
 -

 Key: HIVE-7049
 URL: https://issues.apache.org/jira/browse/HIVE-7049
 Project: Hive
  Issue Type: Bug
Reporter: Mohammad Kamrul Islam
Assignee: Daniel Dai
 Attachments: HIVE-7049.1.patch, HIVE-7049.2.patch, Statistic, 
 Statistics10Min.avsc


 It mainly happens when 
 1 )file schema and record schema are not same
 2 ) Record schema is nullable  but file schema is not.
 The potential code location is at class AvroDeserialize
  
 {noformat}
  if(AvroSerdeUtils.isNullableType(recordSchema)) {
   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
 columnType);
 }
 {noformat}
 In the above code snippet, recordSchema is verified if it is nullable. But 
 the file schema is not checked.
 I tested with these values:
 {noformat}
 recordSchema= [null,string]
 fielSchema= string
 {noformat}
 And i got the following exception line numbers might not be the same due to 
 mu debugged code version.
 {noformat}
 org.apache.avro.AvroRuntimeException: Not a union: string 
 at org.apache.avro.Schema.getTypes(Schema.java:272)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
 at 
 org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
 at 
 org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session

[
https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hari Sankar Sivarama Subramaniyan updated HIVE-10119:
-
Attachment: HIVE-10119.2.patch

[~thejas] Thanks for the detailed feedback. Have made the changes in the new
upload.

Thanks
Hari

Allow Log verbosity to be set in hiveserver2 session

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8731) TPC-DS Q49 : Semantic analyzer order by is not honored when used after union all

2015-04-06 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-8731.
--
Resolution: Cannot Reproduce

Has already been fixed.

 TPC-DS Q49 : Semantic analyzer order by is not honored when used after union 
 all 
 -

 Key: HIVE-8731
 URL: https://issues.apache.org/jira/browse/HIVE-8731
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.13.0, 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Gunther Hagleitner

 TPC-DS query 49 returns more rows than that set in limit.
 Query 
 {code}
 set hive.cbo.enable=true;
 set hive.stats.fetch.column.stats=true;
 set hive.exec.dynamic.partition.mode=nonstrict;
 set hive.tez.auto.reducer.parallelism=true;
 set hive.auto.convert.join.noconditionaltask.size=128000;
 set hive.exec.reducers.bytes.per.reducer=1;
 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager;
 set hive.support.concurrency=false;
 set hive.tez.exec.print.summary=true;
 explain  
 select  
  'web' as channel
  ,web.item
  ,web.return_ratio
  ,web.return_rank
  ,web.currency_rank
  from (
   select 
item
   ,return_ratio
   ,currency_ratio
   ,rank() over (order by return_ratio) as return_rank
   ,rank() over (order by currency_ratio) as currency_rank
   from
   (   select ws.ws_item_sk as item
   ,(cast(sum(coalesce(wr.wr_return_quantity,0)) as decimal(15,4))/
   cast(sum(coalesce(ws.ws_quantity,0)) as decimal(15,4) )) as 
 return_ratio
   ,(cast(sum(coalesce(wr.wr_return_amt,0)) as decimal(15,4))/
   cast(sum(coalesce(ws.ws_net_paid,0)) as decimal(15,4) )) as 
 currency_ratio
   from 
web_sales ws left outer join web_returns wr 
   on (ws.ws_order_number = wr.wr_order_number and 
   ws.ws_item_sk = wr.wr_item_sk)
  ,date_dim
   where 
   wr.wr_return_amt  1 
   and ws.ws_net_profit  1
  and ws.ws_net_paid  0
  and ws.ws_quantity  0
  and ws.ws_sold_date_sk = date_dim.d_date_sk
  and d_year = 2000
  and d_moy = 12
   group by ws.ws_item_sk
   ) in_web
  ) web
  where 
  (
  web.return_rank = 10
  or
  web.currency_rank = 10
  )
  union all
  select 
  'catalog' as channel
  ,catalog.item
  ,catalog.return_ratio
  ,catalog.return_rank
  ,catalog.currency_rank
  from (
   select 
item
   ,return_ratio
   ,currency_ratio
   ,rank() over (order by return_ratio) as return_rank
   ,rank() over (order by currency_ratio) as currency_rank
   from
   (   select 
   cs.cs_item_sk as item
   ,(cast(sum(coalesce(cr.cr_return_quantity,0)) as decimal(15,4))/
   cast(sum(coalesce(cs.cs_quantity,0)) as decimal(15,4) )) as 
 return_ratio
   ,(cast(sum(coalesce(cr.cr_return_amount,0)) as decimal(15,4))/
   cast(sum(coalesce(cs.cs_net_paid,0)) as decimal(15,4) )) as 
 currency_ratio
   from 
   catalog_sales cs left outer join catalog_returns cr
   on (cs.cs_order_number = cr.cr_order_number and 
   cs.cs_item_sk = cr.cr_item_sk)
 ,date_dim
   where 
   cr.cr_return_amount  1 
   and cs.cs_net_profit  1
  and cs.cs_net_paid  0
  and cs.cs_quantity  0
  and cs_sold_date_sk = d_date_sk
  and d_year = 2000
  and d_moy = 12
  group by cs.cs_item_sk
   ) in_cat
  ) catalog
  where 
  (
  catalog.return_rank = 10
  or
  catalog.currency_rank =10
  )
  union all
  select 
  'store' as channel
  ,store.item
  ,store.return_ratio
  ,store.return_rank
  ,store.currency_rank
  from (
   select 
item
   ,return_ratio
   ,currency_ratio
   ,rank() over (order by return_ratio) as return_rank
   ,rank() over (order by currency_ratio) as currency_rank
   from
   (   select sts.ss_item_sk as item
   ,(cast(sum(coalesce(sr.sr_return_quantity,0)) as 
 decimal(15,4))/cast(sum(coalesce(sts.ss_quantity,0)) as decimal(15,4) )) as 
 return_ratio
   ,(cast(sum(coalesce(sr.sr_return_amt,0)) as 
 decimal(15,4))/cast(sum(coalesce(sts.ss_net_paid,0)) as decimal(15,4) )) as 
 currency_ratio
   from 
   store_sales sts left outer join store_returns sr
   on

[jira] [Updated] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session

[
https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hari Sankar Sivarama Subramaniyan updated HIVE-10119:
-
Release Note: The description for the newly added parameter,
hive.server2.logging.level should go into beeline wiki under
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.logging.operation.verbose
. Also, hive.server2.logging.operation.verbose will be no longer available,
hence it should be removed from the beeline wiki. (was: The description for
the newly added parameter, hive.server2.logging.level should go into beeline
wiki under
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.logging.operation.verbose
. Also, hive.server2.logging.operation.verbose will be deprecated, hence it
should be removed from the beeline wiki.)

Allow Log verbosity to be set in hiveserver2 session

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8164) Adding in a ReplicationTask that converts a Notification Event to actionable tasks


[ 
https://issues.apache.org/jira/browse/HIVE-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482047#comment-14482047
 ] 

Hive QA commented on HIVE-8164:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12723424/HIVE-8164.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8701 tests executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3297/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3297/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3297/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12723424 - PreCommit-HIVE-TRUNK-Build

 Adding in a ReplicationTask that converts a Notification Event to actionable 
 tasks
 --

 Key: HIVE-8164
 URL: https://issues.apache.org/jira/browse/HIVE-8164
 Project: Hive
  Issue Type: Sub-task
  Components: Import/Export
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-8164.2.patch, HIVE-8164.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10230) Vectorization: Look at performance of VectorExpressionWriterFactory and its use of DateWritable.daysToMillis

2015-04-06 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10230:
---
Attachment: datewritable-serialization.png

 Vectorization: Look at performance of VectorExpressionWriterFactory and its 
 use of DateWritable.daysToMillis
 

 Key: HIVE-10230
 URL: https://issues.apache.org/jira/browse/HIVE-10230
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: datewritable-serialization.png


 [~gopalv] found that DateWritable code in VectorExpressionWriterFactory 
 showing up in hot-code paths from vectorization.  Probably slowness where it 
 calls ZoneInfo::getOffset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10118) CBO (Calcite Return Path): Internal error: Cannot find common type for join keys

2015-04-06 Thread Mostafa Mokhtar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482199#comment-14482199
 ] 

Mostafa Mokhtar commented on HIVE-10118:


[~jpullokkaran]

Ended up rediscovering the same issue using this query, where the NDV of a 
wrong column gets used in the costing which produces a plan based on wrong 
assumptions.
{code}
select 
ss_sold_date_sk
from
store_sales,
date_dim d1,
store
where
d1.d_date_sk = store_sales.ss_sold_date_sk
and store.s_store_sk = store_sales.ss_store_sk;
{code}

 CBO (Calcite Return Path): Internal error: Cannot find common type for join 
 keys 
 -

 Key: HIVE-10118
 URL: https://issues.apache.org/jira/browse/HIVE-10118
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
 Fix For: cbo-branch


 Query 
 {code}
 explain
   select  ss_items.item_id
,ss_item_rev
,ss_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ss_dev
,cs_item_rev
,cs_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 cs_dev
,ws_item_rev
,cs_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 cs_dev
,ws_item_rev
,ws_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ws_dev
,(ss_item_rev+cs_item_rev+ws_item_rev)/3 average
 FROM
 ( select i_item_id item_id ,sum(ss_ext_sales_price) as ss_item_rev
  from store_sales
  JOIN item ON store_sales.ss_item_sk = item.i_item_sk
  JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
  JOIN (select d1.d_date
  from date_dim d1 JOIN date_dim d2 ON d1.d_week_seq = 
 d2.d_week_seq
  where d2.d_date = '1998-08-04') sub ON date_dim.d_date = 
 sub.d_date
  group by i_item_id ) ss_items
 JOIN
 ( select i_item_id item_id ,sum(cs_ext_sales_price) as cs_item_rev
  from catalog_sales
  JOIN item ON catalog_sales.cs_item_sk = item.i_item_sk
  JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
  JOIN (select d1.d_date
  from date_dim d1 JOIN date_dim d2 ON d1.d_week_seq = 
 d2.d_week_seq
  where d2.d_date = '1998-08-04') sub ON date_dim.d_date = 
 sub.d_date
  group by i_item_id ) cs_items
 ON ss_items.item_id=cs_items.item_id
 JOIN
 ( select i_item_id item_id ,sum(ws_ext_sales_price) as ws_item_rev
  from web_sales
  JOIN item ON web_sales.ws_item_sk = item.i_item_sk
  JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
  JOIN (select d1.d_date
  from date_dim d1 JOIN date_dim d2 ON d1.d_week_seq = 
 d2.d_week_seq
  where d2.d_date = '1998-08-04') sub ON date_dim.d_date = 
 sub.d_date
  group by i_item_id ) ws_items
 ON ss_items.item_id=ws_items.item_id
  where
ss_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev
and ss_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev
and cs_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev
and cs_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev
and ws_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev
and ws_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev
  order by item_id ,ss_item_rev
  limit 100
 {code}
 Exception 
 {code}
  limit 100
 15/03/27 12:38:32 [main]: ERROR parse.CalcitePlanner: CBO failed, skipping 
 CBO.
 java.lang.RuntimeException: java.lang.AssertionError: Internal error: Cannot 
 find common type for join keys $1 (type INTEGER) and $1 (type 
 VARCHAR(2147483647))
   at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.rethrowCalciteException(CalcitePlanner.java:677)
   at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:586)
   at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:238)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9998)
   at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:201)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
   at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1114)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1162)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)

[jira] [Commented] (HIVE-10187) Avro backed tables don't handle cyclical or recursive records

2015-04-06 Thread Mark Wagner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482195#comment-14482195
 ] 

Mark Wagner commented on HIVE-10187:


The test failure seems to be unrelated.

 Avro backed tables don't handle cyclical or recursive records
 -

 Key: HIVE-10187
 URL: https://issues.apache.org/jira/browse/HIVE-10187
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 1.2.0
Reporter: Mark Wagner
Assignee: Mark Wagner
 Attachments: HIVE-10187.1.patch, HIVE-10187.demo.patch


 [HIVE-7653] changed the Avro SerDe to make it generate TypeInfos even for 
 recursive/cyclical schemas. However, any attempt to serialize data which 
 exploits that ability results in silently dropped fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session


 [ 
https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-10119:
-
Attachment: HIVE-10119.3.patch

 Allow Log verbosity to be set in hiveserver2 session
 

 Key: HIVE-10119
 URL: https://issues.apache.org/jira/browse/HIVE-10119
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch, 
 HIVE-10119.3.patch


 We need to be able to set logging per HS2 session.
 The client often uses the map-reduce completion matrix (Execution) that shows 
 up in Beeline to debug performance. User might not want the verbose log view 
 all the time since it obfuscates the Execution information. Hence the client 
 should be able to change the verbosity level.
 Also, there are 2 levels of verbosity at HS2 logging and not 3. The users 
 might want Execution + Performance counters only - so that level needs to be 
 added.
 So for logs,  the user should be able to set 3 levels of verbosity in the 
 session, that will override the default verbosity specified in the 
 hive-site.xml file.
 0. None - IGNORE
 1. Execution - Just shows the map-reduce tasks completing 
 2. Performance - Execution + Performance counters dumped at the end
 3. Verbose - All logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10226) Column stats for Date columns not supported

2015-04-06 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482299#comment-14482299
 ] 

Jason Dere commented on HIVE-10226:
---

[~ashutoshc] I'll see what I can do here.
[~gopalv] Is there a more complete stack trace available here, in case this 
issue still crops up after I make these changes? The qfile test (which uses 
analyze table) seemed to work ok.

 Column stats for Date columns not supported
 ---

 Key: HIVE-10226
 URL: https://issues.apache.org/jira/browse/HIVE-10226
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10226.1.patch


 {noformat}
 hive explain analyze table revenues compute statistics for columns;
 2015-03-30 23:47:45,133 ERROR [main()]: ql.Driver 
 (SessionState.java:printError(951)) - FAILED: UDFArgumentTypeException Only 
 integer/long/timestamp/float/double/string/binary/boolean/decimal type 
 argument is accepted but date is passed.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10226) Column stats for Date columns not supported


[ 
https://issues.apache.org/jira/browse/HIVE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482321#comment-14482321
 ] 

Gopal V commented on HIVE-10226:


[~jdere]: I am running the latest build against a hive-1.0 metastore, that 
won't be ever hit via the qtests.

{code}
org.apache.thrift.protocol.TProtocolException: Cannot write a TUnion with no 
set value!
at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:240)
at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
at org.apache.thrift.TUnion.write(TUnion.java:152)
at 
org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj$ColumnStatisticsObjStandardScheme.write(ColumnStatisticsObj.java:550)
at 
org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj$ColumnStatisticsObjStandardScheme.write(ColumnStatisticsObj.java:488)
at 
org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj.write(ColumnStatisticsObj.java:414)
at 
org.apache.hadoop.hive.metastore.api.TableStatsResult$TableStatsResultStandardScheme.write(TableStatsResult.java:388)
at 
org.apache.hadoop.hive.metastore.api.TableStatsResult$TableStatsResultStandardScheme.write(TableStatsResult.java:338)
at 
org.apache.hadoop.hive.metastore.api.TableStatsResult.write(TableStatsResult.java:288)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_statistics_req_result$get_table_statistics_req_resultStandardScheme.write(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_statistics_req_result$get_table_statistics_req_resultStandardScheme.write(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_statistics_req_result.write(ThriftHiveMetastore.java)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

The error is probably not relevant except for rolling upgrade scenarios (the 
standard one is client upgraded, server yet to be restarted), since the error 
is not a FATAL.

Adding a new Stats type is probably a non-backwards compat change anyway, so 
perhaps we can mark this as a wire protocol change.

 Column stats for Date columns not supported
 ---

 Key: HIVE-10226
 URL: https://issues.apache.org/jira/browse/HIVE-10226
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10226.1.patch


 {noformat}
 hive explain analyze table revenues compute statistics for columns;
 2015-03-30 23:47:45,133 ERROR [main()]: ql.Driver 
 (SessionState.java:printError(951)) - FAILED: UDFArgumentTypeException Only 
 integer/long/timestamp/float/double/string/binary/boolean/decimal type 
 argument is accepted but date is passed.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10180) Loop optimization in ColumnArithmeticColumn.txt

2015-04-06 Thread Chengxiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482475#comment-14482475
 ] 

Chengxiang Li commented on HIVE-10180:
--

[~gopalv], i'm waiting for [~Ferd]'s working on micro benchmark 
tool(HIVE-10189), seems it would take some time, so i would just put my own 
test result here at first, i just calculate 
DoubleColAddDoubleColumn/LongColAddLongColumn 1 times w/ and w/o patch, 
here is the numbers:
||Expression||Not vectorized(sec)||Vectorized(sec)||
|DoubleColAddDoubleColumn|51.23|17.64|
|LongColAddLongColumn|51.17|21.51|
Environment:
java version 1.8.0_40
Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
Intel(R) Core(TM) i3-2130 CPU @ 3.40GHz
Linux version 2.6.32-279.el6.x86_64

 Loop optimization in ColumnArithmeticColumn.txt
 ---

 Key: HIVE-10180
 URL: https://issues.apache.org/jira/browse/HIVE-10180
 Project: Hive
  Issue Type: Sub-task
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor
 Attachments: HIVE-10180.1.patch


 JVM is quite strict on the code schema which may executed with SIMD 
 instructions, take a loop in DoubleColAddDoubleColumn.java for example, 
 {code:java}
 for (int i = 0; i != n; i++) {
   outputVector[i] = vector1[0] + vector2[i];
 }
 {code}
 The vector1[0] reference would prevent JVM to execute this part of code 
 with vectorized instructions, we need to assign the vector1[0] to a 
 variable outside of loop, and use that variable in loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10098) HS2 local task for map join fails in KMS encrypted cluster


[ 
https://issues.apache.org/jira/browse/HIVE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482478#comment-14482478
 ] 

Yongzhi Chen commented on HIVE-10098:
-

The other failure: TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a 
TEST-*.xml file is not related either. I saw this failure in other precommit 
builds 

 HS2 local task for map join fails in KMS encrypted cluster
 --

 Key: HIVE-10098
 URL: https://issues.apache.org/jira/browse/HIVE-10098
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10098.1.patch, HIVE-10098.2.patch


 Env: KMS was enabled after cluster was kerberos secured. 
 Problem: PROBLEM: Any Hive query via beeline that performs a MapJoin fails 
 with a java.lang.reflect.UndeclaredThrowableException  from 
 KMSClientProvider.addDelegationTokens.
 {code}
 2015-03-18 08:49:17,948 INFO [main]: Configuration.deprecation 
 (Configuration.java:warnOnceIfDeprecated(1022)) - mapred.input.dir is 
 deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 
 2015-03-18 08:49:19,048 WARN [main]: security.UserGroupInformation 
 (UserGroupInformation.java:doAs(1645)) - PriviledgedActionException as:hive 
 (auth:KERBEROS) 
 cause:org.apache.hadoop.security.authentication.client.AuthenticationException:
  GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt) 
 2015-03-18 08:49:19,050 ERROR [main]: mr.MapredLocalTask 
 (MapredLocalTask.java:executeFromChildJVM(314)) - Hive Runtime Error: Map 
 local work failed 
 java.io.IOException: java.io.IOException: 
 java.lang.reflect.UndeclaredThrowableException 
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:634)
  
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:363)
  
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:337)
  
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:303)
 at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:735) 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
 at java.lang.reflect.Method.invoke(Method.java:606) 
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 
 Caused by: java.io.IOException: 
 java.lang.reflect.UndeclaredThrowableException 
 at 
 org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:826)
  
 at 
 org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86)
  
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2017)
  
 at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
  
 at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
  
 at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
  
 at 
 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) 
 at 
 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) 
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:413)
  
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:559)
  
 ... 9 more 
 Caused by: java.lang.reflect.UndeclaredThrowableException 
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655)
  
 at 
 org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:808)
  
 ... 18 more 
 Caused by: 
 org.apache.hadoop.security.authentication.client.AuthenticationException: 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt) 
 at 
 org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:306)
  
 at 
 org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196)
  
 at 
 org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:127)
 {code}
 To make sure map join happen, test need a small table join with a large one, 
 for example:
 {code}
 CREATE TABLE if not exists jsmall (code string, des string, t int, s int) ROW 
 FORMAT DELIMITED FIELDS TERMINATED BY '\t';
 CREATE TABLE if not exists jbig1 (code string, des string, t int, s int) ROW 
 FORMAT DELIMITED

[jira] [Updated] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session


 [ 
https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-10119:
-
Issue Type: Improvement  (was: Bug)

 Allow Log verbosity to be set in hiveserver2 session
 

 Key: HIVE-10119
 URL: https://issues.apache.org/jira/browse/HIVE-10119
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch, 
 HIVE-10119.3.patch


 We need to be able to set logging per HS2 session.
 The client often uses the map-reduce completion matrix (Execution) that shows 
 up in Beeline to debug performance. User might not want the verbose log view 
 all the time since it obfuscates the Execution information. Hence the client 
 should be able to change the verbosity level.
 Also, there are 2 levels of verbosity at HS2 logging and not 3. The users 
 might want Execution + Performance counters only - so that level needs to be 
 added.
 So for logs,  the user should be able to set 3 levels of verbosity in the 
 session, that will override the default verbosity specified in the 
 hive-site.xml file.
 0. None - IGNORE
 1. Execution - Just shows the map-reduce tasks completing 
 2. Performance - Execution + Performance counters dumped at the end
 3. Verbose - All logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10234) Problems using custom inputformat

2015-04-06 Thread xuhuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuhuan updated HIVE-10234:
--
Description: 
I apply the RCFileProtobufInputFormat from twitter's elephantbird project into 
hive to handle Protobuf records which are written in a RCFile file format. What 
I have done is belowing:
1. compile the soure code of RCFileProtobufInputFormat and make a jar
2. add the jar to hive_aux_jar_path and update the config of server and client
3. create table using inputformat 
com.twitter.elephantbird.mapreduce.input.RCFileProtobufInputFormat
Creating table is ok.but when I excute a query like select * from table 
test(there is no data in table test),I got these mistake: FAILED: 
SemanticException 1:14 Input format must implement InputFormat. Error 
encountered near token test. 
I found some cases in stackoverflow,someones said that's because the custom 
inputformat class implement the new API(org.apache.hadoop.marpreduce) and 
RCFileProtobufInputFormat class indeed extends the new API. I'm wondering if a 
custom inputformat must extend the old API(org.apache.hadoop.marpred)? Could 
this be the reason of my problem?


  was:
I apply the RCFileProtobufInputFormat from twitter's elephantbird project into 
hive to handle Protobuf records which are written in a RCFile file format. What 
I have done is belowing:
1. compile the soure code of RCFileProtobufInputFormat and make a jar
2. add the jar to hive_aux_jar_path and update the config of server and client
3. create table using inputformat 
com.twitter.elephantbird.mapreduce.input.RCFileProtobufInputFormat
Creating table is ok.but when I excute a query like select * from table 
test(there is no data in table test),I got these mistake: FAILED: 
SemanticException 1:14 Input format must implement InputFormat. Error 
encountered near token test. 
I found some cases in stackoverflow,someones said that's because the custom 
inputformat class implement the new API(org.apache.hadoop.marpreduce) and 
RCFileProtobufInputFormat class indeed extends the new API. I'm wondering 
wheather a custom inputformat must extend the old 
API(org.apache.hadoop.marpred)? Could this be the reason of my problem?



 Problems using custom inputformat
 -

 Key: HIVE-10234
 URL: https://issues.apache.org/jira/browse/HIVE-10234
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.13.1
 Environment: CentOS release 6.3
 CDH-5.3.1-1.cdh5.3.1.p0.5
 hive-0.13.1-cdh5.3.1
 Inputformat: 
 https://github.com/twitter/elephant-bird/blob/master/rcfile/src/main/java/com/twitter/elephantbird/mapreduce/input/RCFileProtobufInputFormat.java
Reporter: xuhuan
Priority: Critical
  Labels: inputformat

 I apply the RCFileProtobufInputFormat from twitter's elephantbird project 
 into hive to handle Protobuf records which are written in a RCFile file 
 format. What I have done is belowing:
 1. compile the soure code of RCFileProtobufInputFormat and make a jar
 2. add the jar to hive_aux_jar_path and update the config of server and client
 3. create table using inputformat 
 com.twitter.elephantbird.mapreduce.input.RCFileProtobufInputFormat
 Creating table is ok.but when I excute a query like select * from table 
 test(there is no data in table test),I got these mistake: FAILED: 
 SemanticException 1:14 Input format must implement InputFormat. Error 
 encountered near token test. 
 I found some cases in stackoverflow,someones said that's because the custom 
 inputformat class implement the new API(org.apache.hadoop.marpreduce) and 
 RCFileProtobufInputFormat class indeed extends the new API. I'm wondering if 
 a custom inputformat must extend the old API(org.apache.hadoop.marpred)? 
 Could this be the reason of my problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10146) Not count session as idle if query is running

2015-04-06 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482546#comment-14482546
 ] 

Lefty Leverenz commented on HIVE-10146:
---

Doc note:  This adds *hive.server2.idle.session.check.operation* to 
HiveConf.java, so it needs to be documented in the HiveServer2 section of 
Configuration Properties for release 1.2.0.  It could go either at the end of 
the section or after *hive.server2.idle.session.timeout* (which could include a 
reference to it).

* [Configuration Properties -- HiveServer2 | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveServer2]
* [Configuration Properties -- HiveServer2 -- hive.server2.idle.session.timeout 
| 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.idle.session.timeout]

 Not count session as idle if query is running
 -

 Key: HIVE-10146
 URL: https://issues.apache.org/jira/browse/HIVE-10146
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-10146.1.patch, HIVE-10146.2.patch, 
 HIVE-10146.3.patch, HIVE-10146.4.patch


 Currently, as long as there is no activity, we think the HS2 session is idle. 
 This makes it very hard to set HIVE_SERVER2_IDLE_SESSION_TIMEOUT. If we don't 
 set it long enough, an unattended query could be killed.
 We should provide an option to not to count the session as idle if some query 
 is still running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10231) Compute partition column stats fails if partition col type is date


 [ 
https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-10231:
---
Attachment: HIVE-10231.1.patch

Thanks [~ashutoshc] for the review. I made the changes and also add more tests 
in the new patch, please take a look, thanks.

 Compute partition column stats fails if partition col type is date
 --

 Key: HIVE-10231
 URL: https://issues.apache.org/jira/browse/HIVE-10231
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Fix For: 1.2.0

 Attachments: HIVE-10231.1.patch, HIVE-10231.patch


 Currently the command analyze table .. partition .. compute statistics for 
 columns may only work for partition column type of string, numeric types, 
 but not others like date. See following case using date as partition coltype:
 {code}
 create table colstatspartdate (key int, value string) partitioned by (ds 
 date, hr int);
 insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select 
 key, value from src limit 20;
 analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute 
 statistics for columns;
 {code}
 you will get RuntimeException:
 {code}
 FAILED: RuntimeException Cannot convert to Date from: int
 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to 
 Date from: int
 java.lang.RuntimeException: Cannot convert to Date from: int
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242)
 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10049) Hive jobs submitted from WebHCat won't log Hive history into YARN ATS

2015-04-06 Thread Xiaoyong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyong Zhu resolved HIVE-10049.
-
Resolution: Duplicate

duplicated with https://issues.apache.org/jira/browse/HIVE-10066

 Hive jobs submitted from WebHCat won't log Hive history into YARN ATS
 -

 Key: HIVE-10049
 URL: https://issues.apache.org/jira/browse/HIVE-10049
 Project: Hive
  Issue Type: Improvement
  Components: Hive, WebHCat
Affects Versions: 1.0.0
Reporter: Xiaoyong Zhu
Priority: Critical

 When executed from CLI, the Hive job could be viewed from YARN ATS 
 (/ws/v1/timeline/HIVE_QUERY_ID/). However, if a Hive job is submitted from 
 WebHCat, then no logs will be in YARN ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10231) Compute partition column stats fails if partition col type is date

2015-04-06 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482271#comment-14482271
 ] 

Ashutosh Chauhan commented on HIVE-10231:
-

[~ctang.ma] Since Hive's type system is lucy-gucy you might be able to get away 
with quoting all the time, but I won't recommend it. My suggestion will be to 
rather go on other extreme and help parser as much as we can about types. So, 
generate constants in query like following:
{code}
switch (colType):
  case Long:
 colVal+L;
  case SmallInt:
 colVal+S;
  case Tinyint:
 colVal+Y;
  case Decimal:
colVal+BD;
  case String:
  case Char:
  case VarChar:
' +colVal+ ' ;
 case Date:
  date  ' +colVal+ ' ;
case TimeStamp:
  timestamp  ' +colVal+ ' ; 
{code} 

 Compute partition column stats fails if partition col type is date
 --

 Key: HIVE-10231
 URL: https://issues.apache.org/jira/browse/HIVE-10231
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Fix For: 1.2.0

 Attachments: HIVE-10231.patch


 Currently the command analyze table .. partition .. compute statistics for 
 columns may only work for partition column type of string, numeric types, 
 but not others like date. See following case using date as partition coltype:
 {code}
 create table colstatspartdate (key int, value string) partitioned by (ds 
 date, hr int);
 insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select 
 key, value from src limit 20;
 analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute 
 statistics for columns;
 {code}
 you will get RuntimeException:
 {code}
 FAILED: RuntimeException Cannot convert to Date from: int
 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to 
 Date from: int
 java.lang.RuntimeException: Cannot convert to Date from: int
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242)
 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10214) log metastore call timing information aggregated at query level


 [ 
https://issues.apache.org/jira/browse/HIVE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-10214:
-
Issue Type: Improvement  (was: Bug)

 log metastore call timing information aggregated at query level
 ---

 Key: HIVE-10214
 URL: https://issues.apache.org/jira/browse/HIVE-10214
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10214.1.patch, HIVE-10214.2.patch


 For troubleshooting issues, it would be useful to log timing information for 
 metastore api calls, aggregated at a query level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)


[ 
https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482405#comment-14482405
 ] 

Sergey Shelukhin edited comment on HIVE-10161 at 4/7/15 1:57 AM:
-

When multiple RGs include the same partial CB (due to ORC end boundary being an 
estimate), the first one reads the length, determines that this is an 
incomplete CB and bails, the 2nd one tries to blindly read the length but the 
buffer is now offset by 3 bytes from the original read. Boom! Fixed that, also 
fixed some small issue with early unlocking.


was (Author: sershe):
When multiple RGs include the same partial CB (due to ORC end boundary being an 
estimate), the first one reads the length, determines that this is an 
incomplete CB and bails, the 2nd one tries to blindly read the length but the 
compressed block is now offset by 3 bytes from the original read. Boom! Fixed 
that, also fixed some small issue with early unlocking.

 LLAP: ORC file contains compression buffers larger than bufferSize (OR reader 
 has a bug)
 

 Key: HIVE-10161
 URL: https://issues.apache.org/jira/browse/HIVE-10161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap


 The EncodedReaderImpl will die when reading from the cache, when reading data 
 written by the regular ORC writer 
 {code}
 Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer 
 size too small. size = 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
 ... 22 more
 Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 
 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780)
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
 vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null]
 {code}
 Turning off hive.llap.io.enabled makes the error go away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)


 [ 
https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10161.
-
Resolution: Fixed

When multiple RGs include the same partial CB (due to ORC end boundary being an 
estimate), the first one reads the length, determines that this is an 
incomplete RG and bails, the 2nd one tries to blindly read the length but the 
compressed block is now offset by 3 bytes from the original read. Boom! Fixed 
that, also fixed some small issue with early unlocking.

 LLAP: ORC file contains compression buffers larger than bufferSize (OR reader 
 has a bug)
 

 Key: HIVE-10161
 URL: https://issues.apache.org/jira/browse/HIVE-10161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap


 The EncodedReaderImpl will die when reading from the cache, when reading data 
 written by the regular ORC writer 
 {code}
 Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer 
 size too small. size = 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
 ... 22 more
 Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 
 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780)
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
 vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null]
 {code}
 Turning off hive.llap.io.enabled makes the error go away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)


[ 
https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482405#comment-14482405
 ] 

Sergey Shelukhin edited comment on HIVE-10161 at 4/7/15 1:56 AM:
-

When multiple RGs include the same partial CB (due to ORC end boundary being an 
estimate), the first one reads the length, determines that this is an 
incomplete CB and bails, the 2nd one tries to blindly read the length but the 
compressed block is now offset by 3 bytes from the original read. Boom! Fixed 
that, also fixed some small issue with early unlocking.


was (Author: sershe):
When multiple RGs include the same partial CB (due to ORC end boundary being an 
estimate), the first one reads the length, determines that this is an 
incomplete RG and bails, the 2nd one tries to blindly read the length but the 
compressed block is now offset by 3 bytes from the original read. Boom! Fixed 
that, also fixed some small issue with early unlocking.

 LLAP: ORC file contains compression buffers larger than bufferSize (OR reader 
 has a bug)
 

 Key: HIVE-10161
 URL: https://issues.apache.org/jira/browse/HIVE-10161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap


 The EncodedReaderImpl will die when reading from the cache, when reading data 
 written by the regular ORC writer 
 {code}
 Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer 
 size too small. size = 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
 ... 22 more
 Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 
 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780)
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
 vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null]
 {code}
 Turning off hive.llap.io.enabled makes the error go away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9937) LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join


[ 
https://issues.apache.org/jira/browse/HIVE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482413#comment-14482413
 ] 

Hive QA commented on HIVE-9937:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12723438/HIVE-9937.91.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8715 tests executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3300/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3300/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3300/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12723438 - PreCommit-HIVE-TRUNK-Build

 LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new 
 Vectorized Map Join
 --

 Key: HIVE-9937
 URL: https://issues.apache.org/jira/browse/HIVE-9937
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-9937.01.patch, HIVE-9937.02.patch, 
 HIVE-9937.03.patch, HIVE-9937.04.patch, HIVE-9937.05.patch, 
 HIVE-9937.06.patch, HIVE-9937.07.patch, HIVE-9937.08.patch, 
 HIVE-9937.09.patch, HIVE-9937.91.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10226) Column stats for Date columns not supported


[ 
https://issues.apache.org/jira/browse/HIVE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482310#comment-14482310
 ] 

Hive QA commented on HIVE-10226:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12723434/HIVE-10226.1.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8653 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-parallel_orderby.q-reduce_deduplicate.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-infer_bucket_sort_bucketed_table.q-bucket4.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3299/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3299/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3299/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12723434 - PreCommit-HIVE-TRUNK-Build

 Column stats for Date columns not supported
 ---

 Key: HIVE-10226
 URL: https://issues.apache.org/jira/browse/HIVE-10226
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10226.1.patch


 {noformat}
 hive explain analyze table revenues compute statistics for columns;
 2015-03-30 23:47:45,133 ERROR [main()]: ql.Driver 
 (SessionState.java:printError(951)) - FAILED: UDFArgumentTypeException Only 
 integer/long/timestamp/float/double/string/binary/boolean/decimal type 
 argument is accepted but date is passed.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10214) log metastore call timing information aggregated at query level


 [ 
https://issues.apache.org/jira/browse/HIVE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-10214:
-
Attachment: HIVE-10214.2.patch

addressing review comments in 2.patch


 log metastore call timing information aggregated at query level
 ---

 Key: HIVE-10214
 URL: https://issues.apache.org/jira/browse/HIVE-10214
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10214.1.patch, HIVE-10214.2.patch


 For troubleshooting issues, it would be useful to log timing information for 
 metastore api calls, aggregated at a query level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10225) CLI JLine does not flush history on quit/Ctrl-C

2015-04-06 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482377#comment-14482377
 ] 

Prasanth Jayachandran commented on HIVE-10225:
--

Verified the patch and it works as expected. LGTM, +1. Pending tests.

 CLI JLine does not flush history on quit/Ctrl-C
 ---

 Key: HIVE-10225
 URL: https://issues.apache.org/jira/browse/HIVE-10225
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10225.1.patch


 Hive CLI is not saving history, if hive cli is terminated using a Ctrl-C or 
 quit;.
 HIVE-9310 fixed it for the case where one exits with Ctrl-D (EOF), but not 
 for the above ways of exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10189) Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization

2015-04-06 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-10189:
---

Assignee: Ferdinand Xu

 Create a micro benchmark tool for vectorization to evaluate the performance 
 gain after SIMD optimization
 

 Key: HIVE-10189
 URL: https://issues.apache.org/jira/browse/HIVE-10189
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: avx-64.docx


 We should show the performance gain from SIMD optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10231) Compute partition column stats fails if partition col type is date

2015-04-06 Thread Mostafa Mokhtar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482443#comment-14482443
 ] 

Mostafa Mokhtar commented on HIVE-10231:


[~jdere]
FYI.

 Compute partition column stats fails if partition col type is date
 --

 Key: HIVE-10231
 URL: https://issues.apache.org/jira/browse/HIVE-10231
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 1.0.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Fix For: 1.2.0

 Attachments: HIVE-10231.patch


 Currently the command analyze table .. partition .. compute statistics for 
 columns may only work for partition column type of string, numeric types, 
 but not others like date. See following case using date as partition coltype:
 {code}
 create table colstatspartdate (key int, value string) partitioned by (ds 
 date, hr int);
 insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select 
 key, value from src limit 20;
 analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute 
 statistics for columns;
 {code}
 you will get RuntimeException:
 {code}
 FAILED: RuntimeException Cannot convert to Date from: int
 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to 
 Date from: int
 java.lang.RuntimeException: Cannot convert to Date from: int
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242)
 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9223) HiveServer2 on Tez doesn't support concurrent queries within one session

2015-04-06 Thread Raj Bains (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-9223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482490#comment-14482490
]

Raj Bains commented on HIVE-9223:
-

Queries within the same session should execute serially. Hue should open a new
session when you use a new tab. Each tab can be considered a script where
queries run sequentially. Otherwise how can I guarantee that a query creating a
temporary table and the one following it that reads the temporary table are
processed in order? Are we depending on UI tabs for that? The previous MR
behavior seems to be a bug.

HiveServer2 on Tez doesn't support concurrent queries within one session

Key: HIVE-9223
URL: https://issues.apache.org/jira/browse/HIVE-9223
Project: Hive
Issue Type: Bug
Components: HiveServer2
Reporter: Pala M Muthaia

When a user submits multiple queries in the same HS2 session (using thrift
interface) concurrently, the query goes through the same TezSessionState and
ends up being submitted to the same Tez AM, and the second query fails with
the error App master already running a DAG
Is this by design? I looked into the code, and the comments as well as the
code suggest support only for serial execution of queries within the same
HiveServer2 session (on tez).
This works for CLI environment but in a server, it is plausible that client
sends multiple concurrent queries under the same session (e.g: a web app that
executes queries for user, such as Cloudera Hue). So shouldn't HS2 on Tez
implementation support concurrent queries?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables


[ 
https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482507#comment-14482507
 ] 

Hive QA commented on HIVE-9647:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12723440/HIVE-9647.02.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8653 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-parallel_orderby.q-reduce_deduplicate.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-infer_bucket_sort_bucketed_table.q-bucket4.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3301/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3301/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3301/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12723440 - PreCommit-HIVE-TRUNK-Build

 Discrepancy in cardinality estimates between partitioned and un-partitioned 
 tables 
 ---

 Key: HIVE-9647
 URL: https://issues.apache.org/jira/browse/HIVE-9647
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
 Fix For: 1.2.0

 Attachments: HIVE-9647.01.patch, HIVE-9647.02.patch


 High-level summary
 HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number 
 of distinct value to estimate join selectivity.
 The way statistics are aggregated for partitioned tables results in 
 discrepancy in number of distinct values which results in different plans 
 between partitioned and un-partitioned schemas.
 The table below summarizes the NDVs in computeInnerJoinSelectivity which are 
 used to estimate selectivity of joins.
 ||Column  ||Partitioned count distincts|| Un-Partitioned count 
 distincts 
 |sr_customer_sk   |71,245 |1,415,625|
 |sr_item_sk   |38,846|62,562|
 |sr_ticket_number |71,245 |34,931,085|
 |ss_customer_sk   |88,476|1,415,625|
 |ss_item_sk   |38,846|62,562|
 |ss_ticket_number|100,756 |56,256,175|
   
 The discrepancy is because NDV calculation for a partitioned table assumes 
 that the NDV range is contained within each partition and is calculates as 
 select max(NUM_DISTINCTS) from PART_COL_STATS” .
 This is problematic for columns like ticket number which are naturally 
 increasing with the partitioned date column ss_sold_date_sk.
 Suggestions
 Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for 
 HBASE

[jira] [Commented] (HIVE-10160) Give a warning when grouping or ordering by a constant column

2015-04-06 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482566#comment-14482566
 ] 

Chao commented on HIVE-10160:
-

+1 on pending test.

 Give a warning when grouping or ordering by a constant column
 -

 Key: HIVE-10160
 URL: https://issues.apache.org/jira/browse/HIVE-10160
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Lefty Leverenz
Assignee: Yongzhi Chen
Priority: Minor
 Attachments: HIVE-10160.1.patch, HIVE-10160.3.patch, 
 HIVE-10160.4.patch


 To avoid confusion, a warning should be issued when users specify column 
 positions instead of names in a GROUP BY or ORDER BY clause (unless 
 hive.groupby.orderby.position.alias is set to true in Hive 0.11.0 or later).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7271) Speed up unit tests

2015-04-06 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482556#comment-14482556
 ] 

Lefty Leverenz commented on HIVE-7271:
--

Doc note:  *hive.exec.submit.local.task.via.child* is documented in two places 
in the wiki.

* [Unit Test Parallel Execution -- Configuration | 
https://cwiki.apache.org/confluence/display/Hive/Unit+Test+Parallel+Execution#UnitTestParallelExecution-Configuration]
* [Configuration Properties -- Test Properties -- 
hive.exec.submit.local.task.via.child | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.submit.local.task.via.child]

Please review the doc.  If it's okay, the TODOC14 label can be removed from 
this issue.  If changes are needed, just let me know.

 Speed up unit tests
 ---

 Key: HIVE-7271
 URL: https://issues.apache.org/jira/browse/HIVE-7271
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, 
 HIVE-7271.4.patch, HIVE-7271.5.patch, HIVE-7271.6.patch, HIVE-7271.7.patch


 Did some experiments to see if there's a way to speed up unit tests. 
 TestCliDriver seemed to take a lot of time just spinning up/tearing down 
 JVMs. I was also curious to see if running everything on a ram disk would 
 help.
 Results (I ran tests up to authorization_2):
 - Current setup: 40 minutes
 - Single JVM (not using child JVM to run all queries): 8 minutes
 - Single JVM + ram disk: 7 minutes
 So the ram disk didn't help that much. But running tests in single JVM seems 
 worthwhile doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10120) Disallow create table with dot/colon in column name

2015-04-06 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482581#comment-14482581
 ] 

Swarnim Kulkarni commented on HIVE-10120:
-

[~pxiong] Looks good for most part. Left some minor comments though on the 
review.

 Disallow create table with dot/colon in column name
 ---

 Key: HIVE-10120
 URL: https://issues.apache.org/jira/browse/HIVE-10120
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-10120.01.patch


 Since we don't allow users to query column names with dot in the middle such 
 as emp.no, don't allow users to create tables with such columns that cannot 
 be queried. Fix the documentation to reflect this fix.
 Here is an example. Consider this table:
 {code}
 CREATE TABLE a (`emp.no` string);
 select `emp.no` from a; fails with this message:
 FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp 
 from [0:emp.no]
 {code}
 The hive documentation needs to be fixed:
 {code}
  (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) seems 
 to  indicate that any Unicode character can go between the backticks in the 
 select statement, but it doesn’t like the dot/colon or even select * when 
 there is a column that has a dot/colon. 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10189) Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization

2015-04-06 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482627#comment-14482627
 ] 

Ferdinand Xu commented on HIVE-10189:
-

Hi [~chengxiang li], can you help me review this patch? Thank you!

 Create a micro benchmark tool for vectorization to evaluate the performance 
 gain after SIMD optimization
 

 Key: HIVE-10189
 URL: https://issues.apache.org/jira/browse/HIVE-10189
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-10189.patch, avx-64.docx


 We should show the performance gain from SIMD optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session


[ 
https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482573#comment-14482573
 ] 

Hive QA commented on HIVE-10119:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12723492/HIVE-10119.3.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8705 tests executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hive.service.cli.operation.TestOperationLoggingAPI.testFetchResultsOfLogWithExecutionMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPI.testFetchResultsOfLogWithNoneMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPI.testFetchResultsOfLogWithPerformanceMode
org.apache.hive.service.cli.operation.TestOperationLoggingAPI.testFetchResultsOfLogWithVerboseMode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3302/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3302/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3302/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12723492 - PreCommit-HIVE-TRUNK-Build

 Allow Log verbosity to be set in hiveserver2 session
 

 Key: HIVE-10119
 URL: https://issues.apache.org/jira/browse/HIVE-10119
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch, 
 HIVE-10119.3.patch


 We need to be able to set logging per HS2 session.
 The client often uses the map-reduce completion matrix (Execution) that shows 
 up in Beeline to debug performance. User might not want the verbose log view 
 all the time since it obfuscates the Execution information. Hence the client 
 should be able to change the verbosity level.
 Also, there are 2 levels of verbosity at HS2 logging and not 3. The users 
 might want Execution + Performance counters only - so that level needs to be 
 added.
 So for logs,  the user should be able to set 3 levels of verbosity in the 
 session, that will override the default verbosity specified in the 
 hive-site.xml file.
 0. None - IGNORE
 1. Execution - Just shows the map-reduce tasks completing 
 2. Performance - Execution + Performance counters dumped at the end
 3. Verbose - All logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10189) Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization

2015-04-06 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-10189:

Attachment: HIVE-10189.patch

 Create a micro benchmark tool for vectorization to evaluate the performance 
 gain after SIMD optimization
 

 Key: HIVE-10189
 URL: https://issues.apache.org/jira/browse/HIVE-10189
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-10189.patch, avx-64.docx


 We should show the performance gain from SIMD optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10226) Column stats for Date columns not supported

2015-04-06 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482557#comment-14482557
 ] 

Swarnim Kulkarni commented on HIVE-10226:
-

[~jdere] Left minor comments on the review but +1 to Ashutosh's suggestion. 
That should provide us additional flexibility in future(ex: locales etc).

 Column stats for Date columns not supported
 ---

 Key: HIVE-10226
 URL: https://issues.apache.org/jira/browse/HIVE-10226
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10226.1.patch


 {noformat}
 hive explain analyze table revenues compute statistics for columns;
 2015-03-30 23:47:45,133 ERROR [main()]: ql.Driver 
 (SessionState.java:printError(951)) - FAILED: UDFArgumentTypeException Only 
 integer/long/timestamp/float/double/string/binary/boolean/decimal type 
 argument is accepted but date is passed.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9870) Add JvmPauseMonitor threads to HMS and HS2 daemons

2015-04-06 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-9870:
---
Fix Version/s: 1.2.0

 Add JvmPauseMonitor threads to HMS and HS2 daemons
 --

 Key: HIVE-9870
 URL: https://issues.apache.org/jira/browse/HIVE-9870
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, Metastore
Affects Versions: 1.1.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Fix For: 1.2.0

 Attachments: HIVE-9870.patch, HIVE-9870.patch, HIVE-9870.patch


 The hadoop-common carries in it a nifty thread that prints GC or non-GC 
 pauses within the JVM if it exceeds a specific threshold.
 This has been immeasurably useful in supporting several clusters, in 
 identifying GC or other form of process pauses to be the root cause of some 
 event being investigated.
 The HMS and HS2 daemons are good targets for running similar threads within 
 it. It can be loaded in an if-available style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9923) No clear message when from is missing