[jira] [Commented] (HIVE-10134) Fix test failures after HIVE-10130 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481518#comment-14481518 ] Hive QA commented on HIVE-10134: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12709386/HIVE-10134.3-spark.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8710 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/820/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/820/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-820/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12709386 - PreCommit-HIVE-SPARK-Build Fix test failures after HIVE-10130 [Spark Branch] - Key: HIVE-10134 URL: https://issues.apache.org/jira/browse/HIVE-10134 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Chao Fix For: spark-branch Attachments: HIVE-10134.1-spark.patch, HIVE-10134.2-spark.patch, HIVE-10134.3-spark.patch Complete test run: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/#showFailuresLink *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10149) Shuffle Hive data before storing in Parquet
[ https://issues.apache.org/jira/browse/HIVE-10149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10149: --- Attachment: hive.log Here's the log file of the error. It contains different log messages as it was run with the latest trunk version. Shuffle Hive data before storing in Parquet --- Key: HIVE-10149 URL: https://issues.apache.org/jira/browse/HIVE-10149 Project: Hive Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Sergio Peña Assignee: Aihua Xu Attachments: data.txt, hive.log Hive can run into OOM (Out Of Memory) exceptions when writing many dynamic partitions to parquet because it creates too many open files at once and Parquet buffers an entire row group of data in memory for each open file. To avoid this in ORC, HIVE-6455 shuffles data for each partition so only one file is open at a time. We need to extend this support to Parquet and possibly the MR and Spark planners. Steps to reproduce: 1. Create a table and load some data that contains many many partitions (file {{data.txt}} attached on this ticket). {code} hive create table t1_stage(id bigint, rdate string) row format delimited fields terminated by ' '; hive load data local inpath 'data.txt' into table t1_stage; {code} 2. Create a Parquet table, and insert partitioned data from the t1_stage table. {noformat} hive set hive.exec.dynamic.partition.mode=nonstrict; hive create table t1_part(id bigint) partitioned by (rdate string) stored as parquet; hive insert overwrite table t1_part partition(rdate) select * from t1_stage; Query ID = sergio_20150330163713_db3afe74-d1c7-4f0d-a8f1-f2137ddb64a4 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1427748520315_0006, Tracking URL = http://victory:8088/proxy/application_1427748520315_0006/ Kill Command = /opt/local/hadoop/bin/hadoop job -kill job_1427748520315_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-03-30 16:37:19,065 Stage-1 map = 0%, reduce = 0% 2015-03-30 16:37:43,947 Stage-1 map = 100%, reduce = 0% Ended Job = job_1427748520315_0006 with errors Error during job, obtaining debugging information... Examining task ID: task_1427748520315_0006_m_00 (and more) from job job_1427748520315_0006 Task with the most failures(4): - Task ID: task_1427748520315_0006_m_00 URL: http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1427748520315_0006tipid=task_1427748520315_0006_m_00 - Diagnostic Messages for this Task: Error: Java heap space FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10226) Column stats for Date columns not supported
[ https://issues.apache.org/jira/browse/HIVE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-10226: -- Attachment: HIVE-10226.1.patch Re-use the long stats for Date column stats, using the days since epoch value as the long value. Column stats for Date columns not supported --- Key: HIVE-10226 URL: https://issues.apache.org/jira/browse/HIVE-10226 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10226.1.patch {noformat} hive explain analyze table revenues compute statistics for columns; 2015-03-30 23:47:45,133 ERROR [main()]: ql.Driver (SessionState.java:printError(951)) - FAILED: UDFArgumentTypeException Only integer/long/timestamp/float/double/string/binary/boolean/decimal type argument is accepted but date is passed. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9073) NPE when using custom windowing UDAFs
[ https://issues.apache.org/jira/browse/HIVE-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481894#comment-14481894 ] Ashutosh Chauhan commented on HIVE-9073: +1 unless above test failures are because of this patch. A natural extension of this is to move {{Noop}} TableFunctionEvaluator to contrib/ module and then do create function for it in tests where it is used. Currently, {{Noop}} lives in main src tree because of this bug but used heavily in tests. Since contrib/ jar is not on classpath of tests by default, having this test-used class live in contrib/ will help in both regards, keeping src tree free of any test classes and provide true test case for this functionality, since current test uses function which any way is available in classpath of tests. Since above may require some refactoring, I am OK with doing that in a follow-up jira. NPE when using custom windowing UDAFs - Key: HIVE-9073 URL: https://issues.apache.org/jira/browse/HIVE-9073 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-9073.1.patch, HIVE-9073.2.patch, HIVE-9073.2.patch, HIVE-9073.3.patch From the hive-user email group: {noformat} While executing a simple select query using a custom windowing UDAF I created I am constantly running into this error. Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:647) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getWindowFunctionInfo(FunctionRegistry.java:1875) at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.streamingPossible(WindowingTableFunction.java:150) at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.setCanAcceptInputAsStream(WindowingTableFunction.java:221) at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.initializeStreaming(WindowingTableFunction.java:266) at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.initializeStreaming(PTFOperator.java:292) at org.apache.hadoop.hive.ql.exec.PTFOperator.initializeOp(PTFOperator.java:86) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416) at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166) ... 14 more Just wanted to check if any of you have faced this earlier. Also when I try to run the Custom UDAF on another server it works fine. The only difference I can see it that the hive version I am using on my local machine is 0.13.1 where it is working and on the other machine it is 0.13.0 where I see the above mentioned error. I am not sure if this was a bug which was fixed in the later release but I just wanted to confirm the same. {noformat} -- This message was sent by
[jira] [Commented] (HIVE-9310) CLI JLine does not flush history back to ~/.hivehistory
[ https://issues.apache.org/jira/browse/HIVE-9310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481906#comment-14481906 ] Thejas M Nair commented on HIVE-9310: - Created HIVE-10225 for the other cases where the history file is still not created. CLI JLine does not flush history back to ~/.hivehistory --- Key: HIVE-9310 URL: https://issues.apache.org/jira/browse/HIVE-9310 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.15.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.1.0 Attachments: HIVE-9310.1.patch Hive CLI does not seem to be saving history anymore. In JLine with the PersistentHistory class, to keep history across sessions, you need to do {{reader.getHistory().flush()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10220) Concurrent read access within HybridHashTableContainer
[ https://issues.apache.org/jira/browse/HIVE-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481911#comment-14481911 ] Matt McCline commented on HIVE-10220: - +1 (I was concerned that MapJoinOperator was going to reload a hash table and HybridHashTableContainer wouldn't use the right thread safe object. But it appears it reloads into a MapJoinContainer because it is dealing with 1 reloaded hash table at a time.) Concurrent read access within HybridHashTableContainer --- Key: HIVE-10220 URL: https://issues.apache.org/jira/browse/HIVE-10220 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-10220.1.patch HybridHashTableContainer can end up being cached if it does not spill - that needs to follow HIVE-10128 thread safety patterns for the partitioned hash maps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10217) LLAP: Support caching of uncompressed ORC data
[ https://issues.apache.org/jira/browse/HIVE-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481605#comment-14481605 ] Sergey Shelukhin commented on HIVE-10217: - Why is the jira title such as it is? Uncompressed blocks inside compressed file are already supported. Is it a fully uncompressed file? LLAP: Support caching of uncompressed ORC data -- Key: HIVE-10217 URL: https://issues.apache.org/jira/browse/HIVE-10217 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap {code} Caused by: java.io.IOException: ORC compression buffer size (0) is smaller than LLAP low-level cache minimum allocation size (131072). Decrease the value for hive.llap.io.cache.orc.alloc.min at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:137) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10223) Consolidate several redundant FileSystem API calls.
[ https://issues.apache.org/jira/browse/HIVE-10223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10223: - Assignee: Chris Nauroth Consolidate several redundant FileSystem API calls. --- Key: HIVE-10223 URL: https://issues.apache.org/jira/browse/HIVE-10223 Project: Hive Issue Type: Improvement Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HIVE-10223.1.patch This issue proposes to consolidate several Hive calls to the Hadoop Common {{FileSystem}} API into a fewer number of calls that still accomplish the equivalent work. {{FileSystem}} API calls typically translate into RPCs to other services like the HDFS NameNode or alternative file system implementations. Consolidating RPCs will lower latency a bit for Hive code and reduce some load on these external services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8164) Adding in a ReplicationTask that converts a Notification Event to actionable tasks
[ https://issues.apache.org/jira/browse/HIVE-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-8164: --- Attachment: HIVE-8164.2.patch Updated patch attached. Adding in a ReplicationTask that converts a Notification Event to actionable tasks -- Key: HIVE-8164 URL: https://issues.apache.org/jira/browse/HIVE-8164 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-8164.2.patch, HIVE-8164.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10224) LLAP: Clean up encoded ORC tree readers after trunk merge
[ https://issues.apache.org/jira/browse/HIVE-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-10224. -- Resolution: Fixed committed to llap branch LLAP: Clean up encoded ORC tree readers after trunk merge - Key: HIVE-10224 URL: https://issues.apache.org/jira/browse/HIVE-10224 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10224.1.patch Cleanup encoded tree readers after HIVE-10042 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10224) LLAP: Clean up encoded ORC tree readers after trunk merge
[ https://issues.apache.org/jira/browse/HIVE-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10224: - Attachment: HIVE-10224.1.patch LLAP: Clean up encoded ORC tree readers after trunk merge - Key: HIVE-10224 URL: https://issues.apache.org/jira/browse/HIVE-10224 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10224.1.patch Cleanup encoded tree readers after HIVE-10042 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10214) log metastore call timing information aggregated at query level
[ https://issues.apache.org/jira/browse/HIVE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482150#comment-14482150 ] Vaibhav Gumashta commented on HIVE-10214: - Minor comments on rb. Otherwise +1. log metastore call timing information aggregated at query level --- Key: HIVE-10214 URL: https://issues.apache.org/jira/browse/HIVE-10214 Project: Hive Issue Type: Bug Components: Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-10214.1.patch For troubleshooting issues, it would be useful to log timing information for metastore api calls, aggregated at a query level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session
[ https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10119: - Attachment: (was: HIVE-10119.3.patch) Allow Log verbosity to be set in hiveserver2 session Key: HIVE-10119 URL: https://issues.apache.org/jira/browse/HIVE-10119 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch We need to be able to set logging per HS2 session. The client often uses the map-reduce completion matrix (Execution) that shows up in Beeline to debug performance. User might not want the verbose log view all the time since it obfuscates the Execution information. Hence the client should be able to change the verbosity level. Also, there are 2 levels of verbosity at HS2 logging and not 3. The users might want Execution + Performance counters only - so that level needs to be added. So for logs, the user should be able to set 3 levels of verbosity in the session, that will override the default verbosity specified in the hive-site.xml file. 0. None - IGNORE 1. Execution - Just shows the map-reduce tasks completing 2. Performance - Execution + Performance counters dumped at the end 3. Verbose - All logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10216) log hive cli classpath at debug level
[ https://issues.apache.org/jira/browse/HIVE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481932#comment-14481932 ] Vaibhav Gumashta commented on HIVE-10216: - +1 log hive cli classpath at debug level - Key: HIVE-10216 URL: https://issues.apache.org/jira/browse/HIVE-10216 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-10216.1.patch For troubleshooting, it is useful to print the classpath used by hive-cli at DEBUG level in the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9647: -- Attachment: HIVE-9647.02.patch address the metastore configuration flag Discrepancy in cardinality estimates between partitioned and un-partitioned tables --- Key: HIVE-9647 URL: https://issues.apache.org/jira/browse/HIVE-9647 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Fix For: 1.2.0 Attachments: HIVE-9647.01.patch, HIVE-9647.02.patch High-level summary HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number of distinct value to estimate join selectivity. The way statistics are aggregated for partitioned tables results in discrepancy in number of distinct values which results in different plans between partitioned and un-partitioned schemas. The table below summarizes the NDVs in computeInnerJoinSelectivity which are used to estimate selectivity of joins. ||Column ||Partitioned count distincts|| Un-Partitioned count distincts |sr_customer_sk |71,245 |1,415,625| |sr_item_sk |38,846|62,562| |sr_ticket_number |71,245 |34,931,085| |ss_customer_sk |88,476|1,415,625| |ss_item_sk |38,846|62,562| |ss_ticket_number|100,756 |56,256,175| The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as select max(NUM_DISTINCTS) from PART_COL_STATS” . This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk. Suggestions Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for HBASE co-porccessors which we can use as a reference here Using the global stats from TAB_COL_STATS and the per partition stats from PART_COL_STATS extrapolate the NDV for the qualified partitions as in : Max ( (NUM_DISTINCTS from TAB_COL_STATS) x (Number of qualified partitions) / (Number of Partitions), max(NUM_DISTINCTS) from PART_COL_STATS)) More details While doing TPC-DS Partitioned vs. Un-Partitioned runs I noticed that many of the plans are different, then I dumped the CBO logical plan and I found that join estimates are drastically different Unpartitioned schema : {code} 2015-02-10 11:33:27,624 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12624)) - Plan After Join Reordering: HiveProjectRel(store_sales_quantitycount=[$0], store_sales_quantityave=[$1], store_sales_quantitystdev=[$2], store_sales_quantitycov=[/($2, $1)], as_store_returns_quantitycount=[$3], as_store_returns_quantityave=[$4], as_store_returns_quantitystdev=[$5], store_returns_quantitycov=[/($5, $4)]): rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2956 HiveAggregateRel(group=[{}], agg#0=[count($0)], agg#1=[avg($0)], agg#2=[stddev_samp($0)], agg#3=[count($1)], agg#4=[avg($1)], agg#5=[stddev_samp($1)]): rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2954 HiveProjectRel($f0=[$4], $f1=[$8]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2952 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$1], ss_customer_sk=[$2], ss_ticket_number=[$3], ss_quantity=[$4], sr_item_sk=[$5], sr_customer_sk=[$6], sr_ticket_number=[$7], sr_return_quantity=[$8], d_date_sk=[$9], d_quarter_name=[$10]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2982 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2980 HiveJoinRel(condition=[AND(AND(=($2, $6), =($1, $5)), =($3, $7))], joinType=[inner]): rowcount = 28880.460910696, cumulative cost = {6.05654559E8 rows, 0.0 cpu, 0.0 io}, id = 2964 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], ss_customer_sk=[$3], ss_ticket_number=[$9], ss_quantity=[$10]): rowcount = 5.50076554E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2920 HiveTableScanRel(table=[[tpcds_bin_orc_200.store_sales]]): rowcount = 5.50076554E8, cumulative cost = {0}, id = 2822 HiveProjectRel(sr_item_sk=[$2], sr_customer_sk=[$3], sr_ticket_number=[$9], sr_return_quantity=[$10]): rowcount = 5.5578005E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2923 HiveTableScanRel(table=[[tpcds_bin_orc_200.store_returns]]): rowcount = 5.5578005E7, cumulative cost =
[jira] [Commented] (HIVE-10231) Compute partition column stats fails if partition col type is date
[ https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482158#comment-14482158 ] Chaoyu Tang commented on HIVE-10231: [~gopalv] HIVE-10226 is for computing the stats for table columns of date type, it is different from this one which is for computing (any) column stats for a partition when it has date type in its partition column. Compute partition column stats fails if partition col type is date -- Key: HIVE-10231 URL: https://issues.apache.org/jira/browse/HIVE-10231 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Fix For: 1.2.0 Attachments: HIVE-10231.patch Currently the command analyze table .. partition .. compute statistics for columns may only work for partition column type of string, numeric types, but not others like date. See following case using date as partition coltype: {code} create table colstatspartdate (key int, value string) partitioned by (ds date, hr int); insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select key, value from src limit 20; analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute statistics for columns; {code} you will get RuntimeException: {code} FAILED: RuntimeException Cannot convert to Date from: int 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to Date from: int java.lang.RuntimeException: Cannot convert to Date from: int at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8204) Dynamic partition pruning fails with IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-8204: - Fix Version/s: 1.0.0 Dynamic partition pruning fails with IndexOutOfBoundsException -- Key: HIVE-8204 URL: https://issues.apache.org/jira/browse/HIVE-8204 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth Jayachandran Assignee: Gunther Hagleitner Fix For: 1.0.0 Attachments: HIVE-8204.1.patch, HIVE-8204.2.patch Dynamic partition pruning fails with IndexOutOfBounds exception when dimension table is partitioned and fact table is not. Steps to reproduce: 1) Partition date_dim table from tpcds on d_date_sk 2) Fact table is store_sales which is not partitioned 3) Run the following {code} set hive.stats.fetch.column.stats=ture; set hive.tez.dynamic.partition.pruning=true; explain select d_date from store_sales, date_dim where store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 1998; {code} The stack trace is: {code} 2014-09-19 19:06:16,254 ERROR ql.Driver (SessionState.java:printError(825)) - FAILED: IndexOutOfBoundsException Index: 0, Size: 0 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.optimizer.RemoveDynamicPruningBySize.process(RemoveDynamicPruningBySize.java:61) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:61) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsDependentOptimizations(TezCompiler.java:277) at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:120) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:97) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9781) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:407) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1060) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1130) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:997) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:987) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10226) Column stats for Date columns not supported
[ https://issues.apache.org/jira/browse/HIVE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482225#comment-14482225 ] Ashutosh Chauhan commented on HIVE-10226: - Using long column in metastore table to store max and min date is fine, but reusing {{LongColumnStatsData}} in thrift interface to return result not so much. I think we should add new thrift structure {DateColumnStatsData}} something like : {code} struct DateColumnStatsData { 1: optional Date minDate, 2: optional Date maxDate, 3: required i64 numNulls, 4: required i64 numDVs } {code} and than metastore after retrieving # of days from backend, should compute Date, populate this struct and return results back to client. Column stats for Date columns not supported --- Key: HIVE-10226 URL: https://issues.apache.org/jira/browse/HIVE-10226 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10226.1.patch {noformat} hive explain analyze table revenues compute statistics for columns; 2015-03-30 23:47:45,133 ERROR [main()]: ql.Driver (SessionState.java:printError(951)) - FAILED: UDFArgumentTypeException Only integer/long/timestamp/float/double/string/binary/boolean/decimal type argument is accepted but date is passed. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10040) CBO (Calcite Return Path): Pluggable cost modules [CBO branch]
[ https://issues.apache.org/jira/browse/HIVE-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10040: --- Attachment: HIVE-10040.01.cbo.patch [~jpullokkaran], I updated the patch addressing your comments (in RB too). Could you take a look? Thanks CBO (Calcite Return Path): Pluggable cost modules [CBO branch] -- Key: HIVE-10040 URL: https://issues.apache.org/jira/browse/HIVE-10040 Project: Hive Issue Type: Sub-task Components: CBO Affects Versions: cbo-branch Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: cbo-branch Attachments: HIVE-10040.01.cbo.patch, HIVE-10040.cbo.patch We should be able to deal with cost models in a modular way. Thus, the cost model should be integrated within a Calcite MD provider that is pluggable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10231) Compute partition column stats fails if partition col type is date
[ https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-10231: --- Attachment: HIVE-10231.patch There is a bug in rewriting the analyze query in ColumnStatsSemanticAnalyzer. The date value should be quoted in the rewritten query, but was not. In this patch, for the simplicity I quote the partition value in the rewritten query regardless of its data type. Compute partition column stats fails if partition col type is date -- Key: HIVE-10231 URL: https://issues.apache.org/jira/browse/HIVE-10231 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Fix For: 1.2.0 Attachments: HIVE-10231.patch Currently the command analyze table .. partition .. compute statistics for columns may only work for partition column type of string, numeric types, but not others like date. See following case using date as partition coltype: {code} create table colstatspartdate (key int, value string) partitioned by (ds date, hr int); insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select key, value from src limit 20; analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute statistics for columns; {code} you will get RuntimeException: {code} FAILED: RuntimeException Cannot convert to Date from: int 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to Date from: int java.lang.RuntimeException: Cannot convert to Date from: int at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10225) CLI JLine does not flush history on quit/Ctrl-C
[ https://issues.apache.org/jira/browse/HIVE-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482143#comment-14482143 ] Thejas M Nair commented on HIVE-10225: -- Verified that history is getting written in case of quit, exit, CTRL-D, CTRL-C CLI JLine does not flush history on quit/Ctrl-C --- Key: HIVE-10225 URL: https://issues.apache.org/jira/browse/HIVE-10225 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-10225.1.patch Hive CLI is not saving history, if hive cli is terminated using a Ctrl-C or quit;. HIVE-9310 fixed it for the case where one exits with Ctrl-D (EOF), but not for the above ways of exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10118) CBO (Calcite Return Path): Internal error: Cannot find common type for join keys
[ https://issues.apache.org/jira/browse/HIVE-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran reassigned HIVE-10118: - Assignee: Laljo John Pullokkaran (was: Jesus Camacho Rodriguez) CBO (Calcite Return Path): Internal error: Cannot find common type for join keys - Key: HIVE-10118 URL: https://issues.apache.org/jira/browse/HIVE-10118 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Fix For: cbo-branch Query {code} explain select ss_items.item_id ,ss_item_rev ,ss_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ss_dev ,cs_item_rev ,cs_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 cs_dev ,ws_item_rev ,cs_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 cs_dev ,ws_item_rev ,ws_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ws_dev ,(ss_item_rev+cs_item_rev+ws_item_rev)/3 average FROM ( select i_item_id item_id ,sum(ss_ext_sales_price) as ss_item_rev from store_sales JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk JOIN (select d1.d_date from date_dim d1 JOIN date_dim d2 ON d1.d_week_seq = d2.d_week_seq where d2.d_date = '1998-08-04') sub ON date_dim.d_date = sub.d_date group by i_item_id ) ss_items JOIN ( select i_item_id item_id ,sum(cs_ext_sales_price) as cs_item_rev from catalog_sales JOIN item ON catalog_sales.cs_item_sk = item.i_item_sk JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk JOIN (select d1.d_date from date_dim d1 JOIN date_dim d2 ON d1.d_week_seq = d2.d_week_seq where d2.d_date = '1998-08-04') sub ON date_dim.d_date = sub.d_date group by i_item_id ) cs_items ON ss_items.item_id=cs_items.item_id JOIN ( select i_item_id item_id ,sum(ws_ext_sales_price) as ws_item_rev from web_sales JOIN item ON web_sales.ws_item_sk = item.i_item_sk JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk JOIN (select d1.d_date from date_dim d1 JOIN date_dim d2 ON d1.d_week_seq = d2.d_week_seq where d2.d_date = '1998-08-04') sub ON date_dim.d_date = sub.d_date group by i_item_id ) ws_items ON ss_items.item_id=ws_items.item_id where ss_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev and ss_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev and cs_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev and cs_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev and ws_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev and ws_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev order by item_id ,ss_item_rev limit 100 {code} Exception {code} limit 100 15/03/27 12:38:32 [main]: ERROR parse.CalcitePlanner: CBO failed, skipping CBO. java.lang.RuntimeException: java.lang.AssertionError: Internal error: Cannot find common type for join keys $1 (type INTEGER) and $1 (type VARCHAR(2147483647)) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.rethrowCalciteException(CalcitePlanner.java:677) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:586) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:238) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9998) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:201) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1114) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1162) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1041) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at
[jira] [Commented] (HIVE-10206) Improve Alter Table to not initialize Serde unnecessarily
[ https://issues.apache.org/jira/browse/HIVE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482187#comment-14482187 ] Hive QA commented on HIVE-10206: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12709363/HIVE-10206.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8701 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_table_wrong_regex {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3298/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3298/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3298/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12709363 - PreCommit-HIVE-TRUNK-Build Improve Alter Table to not initialize Serde unnecessarily - Key: HIVE-10206 URL: https://issues.apache.org/jira/browse/HIVE-10206 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 1.1.0 Reporter: Szehon Ho Assignee: Szehon Ho Priority: Minor Attachments: HIVE-10206.patch Create an avro table with an external avsc file like: {noformat} CREATE TABLE test(...) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='file:///Users/szehon/Temp/test.avsc', 'kite.compression.type'='snappy', 'transient_lastDdlTime'='1427996456') {noformat} Delete test.avsc file. Try to modify the table properties: {noformat} alter table test set tblproperties ('avro.schema.url'='file:///Users/szehon/Temp/test2.avsc'); {noformat} Will throw an exception like AvroSerdeException: {noformat} at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:119) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:163) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:101) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:78) at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:520) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:377) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:274) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:256) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:595) at org.apache.hadoop.hive.ql.exec.DDLTask.alterTableOrSinglePartition(DDLTask.java:3383) at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3340) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:332) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1037) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at
[jira] [Updated] (HIVE-10160) Give a warning when grouping or ordering by a constant column
[ https://issues.apache.org/jira/browse/HIVE-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-10160: Attachment: HIVE-10160.4.patch Give a warning when grouping or ordering by a constant column - Key: HIVE-10160 URL: https://issues.apache.org/jira/browse/HIVE-10160 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Lefty Leverenz Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-10160.1.patch, HIVE-10160.3.patch, HIVE-10160.4.patch To avoid confusion, a warning should be issued when users specify column positions instead of names in a GROUP BY or ORDER BY clause (unless hive.groupby.orderby.position.alias is set to true in Hive 0.11.0 or later). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session
[ https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482239#comment-14482239 ] Thejas M Nair commented on HIVE-10119: -- The patch looks great. Can you also make one more change ? Change the enum loggingLevel to LoggingLevel ? (start with capital letter). +1 pending that change and test run. Allow Log verbosity to be set in hiveserver2 session Key: HIVE-10119 URL: https://issues.apache.org/jira/browse/HIVE-10119 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch, HIVE-10119.3.patch We need to be able to set logging per HS2 session. The client often uses the map-reduce completion matrix (Execution) that shows up in Beeline to debug performance. User might not want the verbose log view all the time since it obfuscates the Execution information. Hence the client should be able to change the verbosity level. Also, there are 2 levels of verbosity at HS2 logging and not 3. The users might want Execution + Performance counters only - so that level needs to be added. So for logs, the user should be able to set 3 levels of verbosity in the session, that will override the default verbosity specified in the hive-site.xml file. 0. None - IGNORE 1. Execution - Just shows the map-reduce tasks completing 2. Performance - Execution + Performance counters dumped at the end 3. Verbose - All logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan reassigned HIVE-10228: --- Assignee: Sushanth Sowmyan Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10131) LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs
[ https://issues.apache.org/jira/browse/HIVE-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481951#comment-14481951 ] Sergey Shelukhin commented on HIVE-10131: - Left some comments on RB (https://reviews.apache.org/r/32851/) LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs Key: HIVE-10131 URL: https://issues.apache.org/jira/browse/HIVE-10131 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Matt McCline Attachments: HIVE-10131.01.patch, HIVE-10131.02.patch, HIVE-10131.03.patch, HIVE-10131.04.patch, HIVE-10131.05.patch Refs are always allocated and cleared. Should be reused. Iterator is another option -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9709) Hive should support replaying cookie from JDBC driver for beeline
[ https://issues.apache.org/jira/browse/HIVE-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-9709: Attachment: HIVE-9709.3.patch Making some minor changes from the preivious patch. Looking for a clean run. Thanks Hari Hive should support replaying cookie from JDBC driver for beeline - Key: HIVE-9709 URL: https://issues.apache.org/jira/browse/HIVE-9709 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-9709.1.patch, HIVE-9709.2.patch, HIVE-9709.3.patch Consider the following scenario: Beeline Knox HS2. Where Knox is going to LDAP for authentication. To avoid re-authentication, Knox supports using a Cookie to identity a request. However the Beeline JDBC client does not send back the cookie Knox sent and this leads to Knox having to re-create LDAP authentication request on every connection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10230) Vectorization: Look at performance of VectorExpressionWriterFactory and its use of DateWritable.daysToMillis
[ https://issues.apache.org/jira/browse/HIVE-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10230: --- Priority: Minor (was: Critical) Vectorization: Look at performance of VectorExpressionWriterFactory and its use of DateWritable.daysToMillis Key: HIVE-10230 URL: https://issues.apache.org/jira/browse/HIVE-10230 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Priority: Minor Attachments: datewritable-serialization.png [~gopalv] found that DateWritable code in VectorExpressionWriterFactory showing up in hot-code paths from vectorization. Probably slowness where it calls ZoneInfo::getOffset. !datewritable-serialization.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10231) Compute partition column stats fails if partition col type is date
[ https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482113#comment-14482113 ] Gopal V commented on HIVE-10231: Also, with {{set hive.optimize.constant.propagation=false;}} Compute partition column stats fails if partition col type is date -- Key: HIVE-10231 URL: https://issues.apache.org/jira/browse/HIVE-10231 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Fix For: 1.2.0 Currently the command analyze table .. partition .. compute statistics for columns may only work for partition column type of string, numeric types, but not others like date. See following case using date as partition coltype: {code} create table colstatspartdate (key int, value string) partitioned by (ds date, hr int); insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select key, value from src limit 20; analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute statistics for columns; {code} you will get RuntimeException: {code} FAILED: RuntimeException Cannot convert to Date from: int 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to Date from: int java.lang.RuntimeException: Cannot convert to Date from: int at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10231) Compute partition column stats fails if partition col type is date
[ https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482178#comment-14482178 ] Chaoyu Tang commented on HIVE-10231: I have requested the patch review on RB https://reviews.apache.org/r/32908/ , thanks in advance. Compute partition column stats fails if partition col type is date -- Key: HIVE-10231 URL: https://issues.apache.org/jira/browse/HIVE-10231 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Fix For: 1.2.0 Attachments: HIVE-10231.patch Currently the command analyze table .. partition .. compute statistics for columns may only work for partition column type of string, numeric types, but not others like date. See following case using date as partition coltype: {code} create table colstatspartdate (key int, value string) partitioned by (ds date, hr int); insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select key, value from src limit 20; analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute statistics for columns; {code} you will get RuntimeException: {code} FAILED: RuntimeException Cannot convert to Date from: int 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to Date from: int java.lang.RuntimeException: Cannot convert to Date from: int at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10160) Give a warning when grouping or ordering by a constant column
[ https://issues.apache.org/jira/browse/HIVE-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482190#comment-14482190 ] Yongzhi Chen commented on HIVE-10160: - Put the warning to shell, and add one test, waiting for test result. Give a warning when grouping or ordering by a constant column - Key: HIVE-10160 URL: https://issues.apache.org/jira/browse/HIVE-10160 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Lefty Leverenz Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-10160.1.patch, HIVE-10160.3.patch, HIVE-10160.4.patch To avoid confusion, a warning should be issued when users specify column positions instead of names in a GROUP BY or ORDER BY clause (unless hive.groupby.orderby.position.alias is set to true in Hive 0.11.0 or later). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10131) LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs
[ https://issues.apache.org/jira/browse/HIVE-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481951#comment-14481951 ] Sergey Shelukhin edited comment on HIVE-10131 at 4/6/15 9:11 PM: - Left some comments on RB was (Author: sershe): Left some comments on RB (https://reviews.apache.org/r/32851/) LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs Key: HIVE-10131 URL: https://issues.apache.org/jira/browse/HIVE-10131 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Matt McCline Attachments: HIVE-10131.01.patch, HIVE-10131.02.patch, HIVE-10131.03.patch, HIVE-10131.04.patch, HIVE-10131.05.patch Refs are always allocated and cleared. Should be reused. Iterator is another option -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8164) Adding in a ReplicationTask that converts a Notification Event to actionable tasks
[ https://issues.apache.org/jira/browse/HIVE-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481952#comment-14481952 ] Sushanth Sowmyan commented on HIVE-8164: Since the last patch, a few things have changed. a) Instead of a top-level repl/ folder, the replication lib now sits inside webhcat-java-client , which is the home of HCatClient, since that's where this interface is now being exposed b) Iterator/Function semantics instead of List/Map c) ReplicationTask is now abstract, and a ReplicationTaskFactory is responsible for concrete Task initialization. By default, with this patch, there is only NoopReplicationTask. A new replication task factory, EXIMReplicationTaskFactory based on hive export-import will bring in concrete implementations with HIVE-10227 d) Concrete replication support with hive ddl will be added with HIVE-10228 (HIVE-10227 will depend on HIVE-10228) Adding in a ReplicationTask that converts a Notification Event to actionable tasks -- Key: HIVE-8164 URL: https://issues.apache.org/jira/browse/HIVE-8164 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-8164.2.patch, HIVE-8164.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10131) LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs
[ https://issues.apache.org/jira/browse/HIVE-10131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-10131: Attachment: HIVE-10131.06.patch LLAP: BytesBytesMultiHashMap and mapjoin container should reuse refs Key: HIVE-10131 URL: https://issues.apache.org/jira/browse/HIVE-10131 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Matt McCline Attachments: HIVE-10131.01.patch, HIVE-10131.02.patch, HIVE-10131.03.patch, HIVE-10131.04.patch, HIVE-10131.05.patch, HIVE-10131.06.patch Refs are always allocated and cleared. Should be reused. Iterator is another option -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10229) Set conf and processor context in the constructor instead of init
[ https://issues.apache.org/jira/browse/HIVE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-10229: -- Issue Type: Bug (was: Sub-task) Parent: (was: HIVE-7926) Set conf and processor context in the constructor instead of init - Key: HIVE-10229 URL: https://issues.apache.org/jira/browse/HIVE-10229 Project: Hive Issue Type: Bug Environment: Reporter: Sergey Shelukhin Assignee: Siddharth Seth Hit this on ctas13 query. {noformat} Error: Failure while running task:java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:98) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} The line is cacheKey = queryId + processorContext.getTaskVertexName() + REDUCE_PLAN_KEY; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10225) CLI JLine does not flush history on quit/Ctrl-C
[ https://issues.apache.org/jira/browse/HIVE-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10225: - Attachment: HIVE-10225.1.patch Adds a shutdown hook that calls flush on history. This is similar to what we do in beeline. I didn't re-use code from there, since there has been working going on to make beeline usable as a separate module. CLI JLine does not flush history on quit/Ctrl-C --- Key: HIVE-10225 URL: https://issues.apache.org/jira/browse/HIVE-10225 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-10225.1.patch Hive CLI is not saving history, if hive cli is terminated using a Ctrl-C or quit;. HIVE-9310 fixed it for the case where one exits with Ctrl-D (EOF), but not for the above ways of exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10225) CLI JLine does not flush history on quit/Ctrl-C
[ https://issues.apache.org/jira/browse/HIVE-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482141#comment-14482141 ] Thejas M Nair commented on HIVE-10225: -- [~prasanth_j] Can you please review this ? Let me know if you would like review board link for it. CLI JLine does not flush history on quit/Ctrl-C --- Key: HIVE-10225 URL: https://issues.apache.org/jira/browse/HIVE-10225 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-10225.1.patch Hive CLI is not saving history, if hive cli is terminated using a Ctrl-C or quit;. HIVE-9310 fixed it for the case where one exits with Ctrl-D (EOF), but not for the above ways of exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10229) Set conf and processor context in the constructor instead of init
[ https://issues.apache.org/jira/browse/HIVE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-10229: -- Attachment: HIVE-10229.1.patch Fairly simple patch to set jconf and context during construction. Set conf and processor context in the constructor instead of init - Key: HIVE-10229 URL: https://issues.apache.org/jira/browse/HIVE-10229 Project: Hive Issue Type: Bug Environment: Reporter: Sergey Shelukhin Assignee: Siddharth Seth Fix For: 1.2.0 Attachments: HIVE-10229.1.patch Hit this on ctas13 query. {noformat} Error: Failure while running task:java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:98) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} The line is cacheKey = queryId + processorContext.getTaskVertexName() + REDUCE_PLAN_KEY; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9937) LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join
[ https://issues.apache.org/jira/browse/HIVE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481958#comment-14481958 ] Matt McCline commented on HIVE-9937: Rebased to recent checkins and submit. LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join -- Key: HIVE-9937 URL: https://issues.apache.org/jira/browse/HIVE-9937 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-9937.01.patch, HIVE-9937.02.patch, HIVE-9937.03.patch, HIVE-9937.04.patch, HIVE-9937.05.patch, HIVE-9937.06.patch, HIVE-9937.07.patch, HIVE-9937.08.patch, HIVE-9937.09.patch, HIVE-9937.91.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10005) remove some unnecessary branches from the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10005: -- Attachment: HIVE-10005.4.patch remove some unnecessary branches from the inner loop Key: HIVE-10005 URL: https://issues.apache.org/jira/browse/HIVE-10005 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-10005.1.patch, HIVE-10005.2.patch, HIVE-10005.3.patch, HIVE-10005.4.patch Operator.forward is doing too much. There's no reason to do the done checking per row and update it inline. It's much more efficient to just do that when the event that completes an operator happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10229) LLAP: NPE in ReduceRecordProcessor
[ https://issues.apache.org/jira/browse/HIVE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482118#comment-14482118 ] Siddharth Seth commented on HIVE-10229: --- Yep. Same issue I saw. ProcessorContext is null. I'm going to upload a patch for trunk which sets the conf and context in the constructor instead of the init method. LLAP: NPE in ReduceRecordProcessor -- Key: HIVE-10229 URL: https://issues.apache.org/jira/browse/HIVE-10229 Project: Hive Issue Type: Sub-task Environment: Reporter: Sergey Shelukhin Assignee: Gunther Hagleitner Hit this on ctas13 query. {noformat} Error: Failure while running task:java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:98) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} The line is cacheKey = queryId + processorContext.getTaskVertexName() + REDUCE_PLAN_KEY; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10229) Set conf and processor context in the constructor instead of init
[ https://issues.apache.org/jira/browse/HIVE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned HIVE-10229: - Assignee: Siddharth Seth (was: Gunther Hagleitner) Set conf and processor context in the constructor instead of init - Key: HIVE-10229 URL: https://issues.apache.org/jira/browse/HIVE-10229 Project: Hive Issue Type: Sub-task Environment: Reporter: Sergey Shelukhin Assignee: Siddharth Seth Hit this on ctas13 query. {noformat} Error: Failure while running task:java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:98) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} The line is cacheKey = queryId + processorContext.getTaskVertexName() + REDUCE_PLAN_KEY; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10229) Set conf and processor context in the constructor instead of init
[ https://issues.apache.org/jira/browse/HIVE-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-10229: -- Summary: Set conf and processor context in the constructor instead of init (was: LLAP: NPE in ReduceRecordProcessor) Set conf and processor context in the constructor instead of init - Key: HIVE-10229 URL: https://issues.apache.org/jira/browse/HIVE-10229 Project: Hive Issue Type: Sub-task Environment: Reporter: Sergey Shelukhin Assignee: Gunther Hagleitner Hit this on ctas13 query. {noformat} Error: Failure while running task:java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:98) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} The line is cacheKey = queryId + processorContext.getTaskVertexName() + REDUCE_PLAN_KEY; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session
[ https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10119: - Attachment: HIVE-10119.3.patch Allow Log verbosity to be set in hiveserver2 session Key: HIVE-10119 URL: https://issues.apache.org/jira/browse/HIVE-10119 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch, HIVE-10119.3.patch We need to be able to set logging per HS2 session. The client often uses the map-reduce completion matrix (Execution) that shows up in Beeline to debug performance. User might not want the verbose log view all the time since it obfuscates the Execution information. Hence the client should be able to change the verbosity level. Also, there are 2 levels of verbosity at HS2 logging and not 3. The users might want Execution + Performance counters only - so that level needs to be added. So for logs, the user should be able to set 3 levels of verbosity in the session, that will override the default verbosity specified in the hive-site.xml file. 0. None - IGNORE 1. Execution - Just shows the map-reduce tasks completing 2. Performance - Execution + Performance counters dumped at the end 3. Verbose - All logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7049: - Assignee: Daniel Dai (was: Mohammad Kamrul Islam) Unable to deserialize AVRO data when file schema and record schema are different and nullable - Key: HIVE-7049 URL: https://issues.apache.org/jira/browse/HIVE-7049 Project: Hive Issue Type: Bug Reporter: Mohammad Kamrul Islam Assignee: Daniel Dai Attachments: HIVE-7049.1.patch, HIVE-7049.2.patch, Statistic, Statistics10Min.avsc It mainly happens when 1 )file schema and record schema are not same 2 ) Record schema is nullable but file schema is not. The potential code location is at class AvroDeserialize {noformat} if(AvroSerdeUtils.isNullableType(recordSchema)) { return deserializeNullableUnion(datum, fileSchema, recordSchema, columnType); } {noformat} In the above code snippet, recordSchema is verified if it is nullable. But the file schema is not checked. I tested with these values: {noformat} recordSchema= [null,string] fielSchema= string {noformat} And i got the following exception line numbers might not be the same due to mu debugged code version. {noformat} org.apache.avro.AvroRuntimeException: Not a union: string at org.apache.avro.Schema.getTypes(Schema.java:272) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session
[ https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10119: - Attachment: HIVE-10119.2.patch [~thejas] Thanks for the detailed feedback. Have made the changes in the new upload. Thanks Hari Allow Log verbosity to be set in hiveserver2 session Key: HIVE-10119 URL: https://issues.apache.org/jira/browse/HIVE-10119 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch We need to be able to set logging per HS2 session. The client often uses the map-reduce completion matrix (Execution) that shows up in Beeline to debug performance. User might not want the verbose log view all the time since it obfuscates the Execution information. Hence the client should be able to change the verbosity level. Also, there are 2 levels of verbosity at HS2 logging and not 3. The users might want Execution + Performance counters only - so that level needs to be added. So for logs, the user should be able to set 3 levels of verbosity in the session, that will override the default verbosity specified in the hive-site.xml file. 0. None - IGNORE 1. Execution - Just shows the map-reduce tasks completing 2. Performance - Execution + Performance counters dumped at the end 3. Verbose - All logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8731) TPC-DS Q49 : Semantic analyzer order by is not honored when used after union all
[ https://issues.apache.org/jira/browse/HIVE-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-8731. -- Resolution: Cannot Reproduce Has already been fixed. TPC-DS Q49 : Semantic analyzer order by is not honored when used after union all - Key: HIVE-8731 URL: https://issues.apache.org/jira/browse/HIVE-8731 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.0, 0.14.0 Reporter: Mostafa Mokhtar Assignee: Gunther Hagleitner TPC-DS query 49 returns more rows than that set in limit. Query {code} set hive.cbo.enable=true; set hive.stats.fetch.column.stats=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.tez.auto.reducer.parallelism=true; set hive.auto.convert.join.noconditionaltask.size=128000; set hive.exec.reducers.bytes.per.reducer=1; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager; set hive.support.concurrency=false; set hive.tez.exec.print.summary=true; explain select 'web' as channel ,web.item ,web.return_ratio ,web.return_rank ,web.currency_rank from ( select item ,return_ratio ,currency_ratio ,rank() over (order by return_ratio) as return_rank ,rank() over (order by currency_ratio) as currency_rank from ( select ws.ws_item_sk as item ,(cast(sum(coalesce(wr.wr_return_quantity,0)) as decimal(15,4))/ cast(sum(coalesce(ws.ws_quantity,0)) as decimal(15,4) )) as return_ratio ,(cast(sum(coalesce(wr.wr_return_amt,0)) as decimal(15,4))/ cast(sum(coalesce(ws.ws_net_paid,0)) as decimal(15,4) )) as currency_ratio from web_sales ws left outer join web_returns wr on (ws.ws_order_number = wr.wr_order_number and ws.ws_item_sk = wr.wr_item_sk) ,date_dim where wr.wr_return_amt 1 and ws.ws_net_profit 1 and ws.ws_net_paid 0 and ws.ws_quantity 0 and ws.ws_sold_date_sk = date_dim.d_date_sk and d_year = 2000 and d_moy = 12 group by ws.ws_item_sk ) in_web ) web where ( web.return_rank = 10 or web.currency_rank = 10 ) union all select 'catalog' as channel ,catalog.item ,catalog.return_ratio ,catalog.return_rank ,catalog.currency_rank from ( select item ,return_ratio ,currency_ratio ,rank() over (order by return_ratio) as return_rank ,rank() over (order by currency_ratio) as currency_rank from ( select cs.cs_item_sk as item ,(cast(sum(coalesce(cr.cr_return_quantity,0)) as decimal(15,4))/ cast(sum(coalesce(cs.cs_quantity,0)) as decimal(15,4) )) as return_ratio ,(cast(sum(coalesce(cr.cr_return_amount,0)) as decimal(15,4))/ cast(sum(coalesce(cs.cs_net_paid,0)) as decimal(15,4) )) as currency_ratio from catalog_sales cs left outer join catalog_returns cr on (cs.cs_order_number = cr.cr_order_number and cs.cs_item_sk = cr.cr_item_sk) ,date_dim where cr.cr_return_amount 1 and cs.cs_net_profit 1 and cs.cs_net_paid 0 and cs.cs_quantity 0 and cs_sold_date_sk = d_date_sk and d_year = 2000 and d_moy = 12 group by cs.cs_item_sk ) in_cat ) catalog where ( catalog.return_rank = 10 or catalog.currency_rank =10 ) union all select 'store' as channel ,store.item ,store.return_ratio ,store.return_rank ,store.currency_rank from ( select item ,return_ratio ,currency_ratio ,rank() over (order by return_ratio) as return_rank ,rank() over (order by currency_ratio) as currency_rank from ( select sts.ss_item_sk as item ,(cast(sum(coalesce(sr.sr_return_quantity,0)) as decimal(15,4))/cast(sum(coalesce(sts.ss_quantity,0)) as decimal(15,4) )) as return_ratio ,(cast(sum(coalesce(sr.sr_return_amt,0)) as decimal(15,4))/cast(sum(coalesce(sts.ss_net_paid,0)) as decimal(15,4) )) as currency_ratio from store_sales sts left outer join store_returns sr on
[jira] [Updated] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session
[ https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10119: - Release Note: The description for the newly added parameter, hive.server2.logging.level should go into beeline wiki under https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.logging.operation.verbose . Also, hive.server2.logging.operation.verbose will be no longer available, hence it should be removed from the beeline wiki. (was: The description for the newly added parameter, hive.server2.logging.level should go into beeline wiki under https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.logging.operation.verbose . Also, hive.server2.logging.operation.verbose will be deprecated, hence it should be removed from the beeline wiki.) Allow Log verbosity to be set in hiveserver2 session Key: HIVE-10119 URL: https://issues.apache.org/jira/browse/HIVE-10119 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch We need to be able to set logging per HS2 session. The client often uses the map-reduce completion matrix (Execution) that shows up in Beeline to debug performance. User might not want the verbose log view all the time since it obfuscates the Execution information. Hence the client should be able to change the verbosity level. Also, there are 2 levels of verbosity at HS2 logging and not 3. The users might want Execution + Performance counters only - so that level needs to be added. So for logs, the user should be able to set 3 levels of verbosity in the session, that will override the default verbosity specified in the hive-site.xml file. 0. None - IGNORE 1. Execution - Just shows the map-reduce tasks completing 2. Performance - Execution + Performance counters dumped at the end 3. Verbose - All logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8164) Adding in a ReplicationTask that converts a Notification Event to actionable tasks
[ https://issues.apache.org/jira/browse/HIVE-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482047#comment-14482047 ] Hive QA commented on HIVE-8164: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12723424/HIVE-8164.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8701 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3297/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3297/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3297/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12723424 - PreCommit-HIVE-TRUNK-Build Adding in a ReplicationTask that converts a Notification Event to actionable tasks -- Key: HIVE-8164 URL: https://issues.apache.org/jira/browse/HIVE-8164 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-8164.2.patch, HIVE-8164.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10230) Vectorization: Look at performance of VectorExpressionWriterFactory and its use of DateWritable.daysToMillis
[ https://issues.apache.org/jira/browse/HIVE-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10230: --- Attachment: datewritable-serialization.png Vectorization: Look at performance of VectorExpressionWriterFactory and its use of DateWritable.daysToMillis Key: HIVE-10230 URL: https://issues.apache.org/jira/browse/HIVE-10230 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: datewritable-serialization.png [~gopalv] found that DateWritable code in VectorExpressionWriterFactory showing up in hot-code paths from vectorization. Probably slowness where it calls ZoneInfo::getOffset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10118) CBO (Calcite Return Path): Internal error: Cannot find common type for join keys
[ https://issues.apache.org/jira/browse/HIVE-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482199#comment-14482199 ] Mostafa Mokhtar commented on HIVE-10118: [~jpullokkaran] Ended up rediscovering the same issue using this query, where the NDV of a wrong column gets used in the costing which produces a plan based on wrong assumptions. {code} select ss_sold_date_sk from store_sales, date_dim d1, store where d1.d_date_sk = store_sales.ss_sold_date_sk and store.s_store_sk = store_sales.ss_store_sk; {code} CBO (Calcite Return Path): Internal error: Cannot find common type for join keys - Key: HIVE-10118 URL: https://issues.apache.org/jira/browse/HIVE-10118 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Fix For: cbo-branch Query {code} explain select ss_items.item_id ,ss_item_rev ,ss_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ss_dev ,cs_item_rev ,cs_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 cs_dev ,ws_item_rev ,cs_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 cs_dev ,ws_item_rev ,ws_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ws_dev ,(ss_item_rev+cs_item_rev+ws_item_rev)/3 average FROM ( select i_item_id item_id ,sum(ss_ext_sales_price) as ss_item_rev from store_sales JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk JOIN (select d1.d_date from date_dim d1 JOIN date_dim d2 ON d1.d_week_seq = d2.d_week_seq where d2.d_date = '1998-08-04') sub ON date_dim.d_date = sub.d_date group by i_item_id ) ss_items JOIN ( select i_item_id item_id ,sum(cs_ext_sales_price) as cs_item_rev from catalog_sales JOIN item ON catalog_sales.cs_item_sk = item.i_item_sk JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk JOIN (select d1.d_date from date_dim d1 JOIN date_dim d2 ON d1.d_week_seq = d2.d_week_seq where d2.d_date = '1998-08-04') sub ON date_dim.d_date = sub.d_date group by i_item_id ) cs_items ON ss_items.item_id=cs_items.item_id JOIN ( select i_item_id item_id ,sum(ws_ext_sales_price) as ws_item_rev from web_sales JOIN item ON web_sales.ws_item_sk = item.i_item_sk JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk JOIN (select d1.d_date from date_dim d1 JOIN date_dim d2 ON d1.d_week_seq = d2.d_week_seq where d2.d_date = '1998-08-04') sub ON date_dim.d_date = sub.d_date group by i_item_id ) ws_items ON ss_items.item_id=ws_items.item_id where ss_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev and ss_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev and cs_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev and cs_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev and ws_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev and ws_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev order by item_id ,ss_item_rev limit 100 {code} Exception {code} limit 100 15/03/27 12:38:32 [main]: ERROR parse.CalcitePlanner: CBO failed, skipping CBO. java.lang.RuntimeException: java.lang.AssertionError: Internal error: Cannot find common type for join keys $1 (type INTEGER) and $1 (type VARCHAR(2147483647)) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.rethrowCalciteException(CalcitePlanner.java:677) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:586) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:238) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9998) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:201) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1114) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1162) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1051)
[jira] [Commented] (HIVE-10187) Avro backed tables don't handle cyclical or recursive records
[ https://issues.apache.org/jira/browse/HIVE-10187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482195#comment-14482195 ] Mark Wagner commented on HIVE-10187: The test failure seems to be unrelated. Avro backed tables don't handle cyclical or recursive records - Key: HIVE-10187 URL: https://issues.apache.org/jira/browse/HIVE-10187 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 1.2.0 Reporter: Mark Wagner Assignee: Mark Wagner Attachments: HIVE-10187.1.patch, HIVE-10187.demo.patch [HIVE-7653] changed the Avro SerDe to make it generate TypeInfos even for recursive/cyclical schemas. However, any attempt to serialize data which exploits that ability results in silently dropped fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session
[ https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10119: - Attachment: HIVE-10119.3.patch Allow Log verbosity to be set in hiveserver2 session Key: HIVE-10119 URL: https://issues.apache.org/jira/browse/HIVE-10119 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch, HIVE-10119.3.patch We need to be able to set logging per HS2 session. The client often uses the map-reduce completion matrix (Execution) that shows up in Beeline to debug performance. User might not want the verbose log view all the time since it obfuscates the Execution information. Hence the client should be able to change the verbosity level. Also, there are 2 levels of verbosity at HS2 logging and not 3. The users might want Execution + Performance counters only - so that level needs to be added. So for logs, the user should be able to set 3 levels of verbosity in the session, that will override the default verbosity specified in the hive-site.xml file. 0. None - IGNORE 1. Execution - Just shows the map-reduce tasks completing 2. Performance - Execution + Performance counters dumped at the end 3. Verbose - All logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10226) Column stats for Date columns not supported
[ https://issues.apache.org/jira/browse/HIVE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482299#comment-14482299 ] Jason Dere commented on HIVE-10226: --- [~ashutoshc] I'll see what I can do here. [~gopalv] Is there a more complete stack trace available here, in case this issue still crops up after I make these changes? The qfile test (which uses analyze table) seemed to work ok. Column stats for Date columns not supported --- Key: HIVE-10226 URL: https://issues.apache.org/jira/browse/HIVE-10226 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10226.1.patch {noformat} hive explain analyze table revenues compute statistics for columns; 2015-03-30 23:47:45,133 ERROR [main()]: ql.Driver (SessionState.java:printError(951)) - FAILED: UDFArgumentTypeException Only integer/long/timestamp/float/double/string/binary/boolean/decimal type argument is accepted but date is passed. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10226) Column stats for Date columns not supported
[ https://issues.apache.org/jira/browse/HIVE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482321#comment-14482321 ] Gopal V commented on HIVE-10226: [~jdere]: I am running the latest build against a hive-1.0 metastore, that won't be ever hit via the qtests. {code} org.apache.thrift.protocol.TProtocolException: Cannot write a TUnion with no set value! at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:240) at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213) at org.apache.thrift.TUnion.write(TUnion.java:152) at org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj$ColumnStatisticsObjStandardScheme.write(ColumnStatisticsObj.java:550) at org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj$ColumnStatisticsObjStandardScheme.write(ColumnStatisticsObj.java:488) at org.apache.hadoop.hive.metastore.api.ColumnStatisticsObj.write(ColumnStatisticsObj.java:414) at org.apache.hadoop.hive.metastore.api.TableStatsResult$TableStatsResultStandardScheme.write(TableStatsResult.java:388) at org.apache.hadoop.hive.metastore.api.TableStatsResult$TableStatsResultStandardScheme.write(TableStatsResult.java:338) at org.apache.hadoop.hive.metastore.api.TableStatsResult.write(TableStatsResult.java:288) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_statistics_req_result$get_table_statistics_req_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_statistics_req_result$get_table_statistics_req_resultStandardScheme.write(ThriftHiveMetastore.java) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_statistics_req_result.write(ThriftHiveMetastore.java) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} The error is probably not relevant except for rolling upgrade scenarios (the standard one is client upgraded, server yet to be restarted), since the error is not a FATAL. Adding a new Stats type is probably a non-backwards compat change anyway, so perhaps we can mark this as a wire protocol change. Column stats for Date columns not supported --- Key: HIVE-10226 URL: https://issues.apache.org/jira/browse/HIVE-10226 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10226.1.patch {noformat} hive explain analyze table revenues compute statistics for columns; 2015-03-30 23:47:45,133 ERROR [main()]: ql.Driver (SessionState.java:printError(951)) - FAILED: UDFArgumentTypeException Only integer/long/timestamp/float/double/string/binary/boolean/decimal type argument is accepted but date is passed. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10180) Loop optimization in ColumnArithmeticColumn.txt
[ https://issues.apache.org/jira/browse/HIVE-10180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482475#comment-14482475 ] Chengxiang Li commented on HIVE-10180: -- [~gopalv], i'm waiting for [~Ferd]'s working on micro benchmark tool(HIVE-10189), seems it would take some time, so i would just put my own test result here at first, i just calculate DoubleColAddDoubleColumn/LongColAddLongColumn 1 times w/ and w/o patch, here is the numbers: ||Expression||Not vectorized(sec)||Vectorized(sec)|| |DoubleColAddDoubleColumn|51.23|17.64| |LongColAddLongColumn|51.17|21.51| Environment: java version 1.8.0_40 Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) Intel(R) Core(TM) i3-2130 CPU @ 3.40GHz Linux version 2.6.32-279.el6.x86_64 Loop optimization in ColumnArithmeticColumn.txt --- Key: HIVE-10180 URL: https://issues.apache.org/jira/browse/HIVE-10180 Project: Hive Issue Type: Sub-task Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor Attachments: HIVE-10180.1.patch JVM is quite strict on the code schema which may executed with SIMD instructions, take a loop in DoubleColAddDoubleColumn.java for example, {code:java} for (int i = 0; i != n; i++) { outputVector[i] = vector1[0] + vector2[i]; } {code} The vector1[0] reference would prevent JVM to execute this part of code with vectorized instructions, we need to assign the vector1[0] to a variable outside of loop, and use that variable in loop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10098) HS2 local task for map join fails in KMS encrypted cluster
[ https://issues.apache.org/jira/browse/HIVE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482478#comment-14482478 ] Yongzhi Chen commented on HIVE-10098: - The other failure: TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file is not related either. I saw this failure in other precommit builds HS2 local task for map join fails in KMS encrypted cluster -- Key: HIVE-10098 URL: https://issues.apache.org/jira/browse/HIVE-10098 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-10098.1.patch, HIVE-10098.2.patch Env: KMS was enabled after cluster was kerberos secured. Problem: PROBLEM: Any Hive query via beeline that performs a MapJoin fails with a java.lang.reflect.UndeclaredThrowableException from KMSClientProvider.addDelegationTokens. {code} 2015-03-18 08:49:17,948 INFO [main]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1022)) - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 2015-03-18 08:49:19,048 WARN [main]: security.UserGroupInformation (UserGroupInformation.java:doAs(1645)) - PriviledgedActionException as:hive (auth:KERBEROS) cause:org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) 2015-03-18 08:49:19,050 ERROR [main]: mr.MapredLocalTask (MapredLocalTask.java:executeFromChildJVM(314)) - Hive Runtime Error: Map local work failed java.io.IOException: java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:634) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:363) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:337) at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:303) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:735) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:826) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2017) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:413) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:559) ... 9 more Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:808) ... 18 more Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:306) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:127) {code} To make sure map join happen, test need a small table join with a large one, for example: {code} CREATE TABLE if not exists jsmall (code string, des string, t int, s int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; CREATE TABLE if not exists jbig1 (code string, des string, t int, s int) ROW FORMAT DELIMITED
[jira] [Updated] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session
[ https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10119: - Issue Type: Improvement (was: Bug) Allow Log verbosity to be set in hiveserver2 session Key: HIVE-10119 URL: https://issues.apache.org/jira/browse/HIVE-10119 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch, HIVE-10119.3.patch We need to be able to set logging per HS2 session. The client often uses the map-reduce completion matrix (Execution) that shows up in Beeline to debug performance. User might not want the verbose log view all the time since it obfuscates the Execution information. Hence the client should be able to change the verbosity level. Also, there are 2 levels of verbosity at HS2 logging and not 3. The users might want Execution + Performance counters only - so that level needs to be added. So for logs, the user should be able to set 3 levels of verbosity in the session, that will override the default verbosity specified in the hive-site.xml file. 0. None - IGNORE 1. Execution - Just shows the map-reduce tasks completing 2. Performance - Execution + Performance counters dumped at the end 3. Verbose - All logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10234) Problems using custom inputformat
[ https://issues.apache.org/jira/browse/HIVE-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuhuan updated HIVE-10234: -- Description: I apply the RCFileProtobufInputFormat from twitter's elephantbird project into hive to handle Protobuf records which are written in a RCFile file format. What I have done is belowing: 1. compile the soure code of RCFileProtobufInputFormat and make a jar 2. add the jar to hive_aux_jar_path and update the config of server and client 3. create table using inputformat com.twitter.elephantbird.mapreduce.input.RCFileProtobufInputFormat Creating table is ok.but when I excute a query like select * from table test(there is no data in table test),I got these mistake: FAILED: SemanticException 1:14 Input format must implement InputFormat. Error encountered near token test. I found some cases in stackoverflow,someones said that's because the custom inputformat class implement the new API(org.apache.hadoop.marpreduce) and RCFileProtobufInputFormat class indeed extends the new API. I'm wondering if a custom inputformat must extend the old API(org.apache.hadoop.marpred)? Could this be the reason of my problem? was: I apply the RCFileProtobufInputFormat from twitter's elephantbird project into hive to handle Protobuf records which are written in a RCFile file format. What I have done is belowing: 1. compile the soure code of RCFileProtobufInputFormat and make a jar 2. add the jar to hive_aux_jar_path and update the config of server and client 3. create table using inputformat com.twitter.elephantbird.mapreduce.input.RCFileProtobufInputFormat Creating table is ok.but when I excute a query like select * from table test(there is no data in table test),I got these mistake: FAILED: SemanticException 1:14 Input format must implement InputFormat. Error encountered near token test. I found some cases in stackoverflow,someones said that's because the custom inputformat class implement the new API(org.apache.hadoop.marpreduce) and RCFileProtobufInputFormat class indeed extends the new API. I'm wondering wheather a custom inputformat must extend the old API(org.apache.hadoop.marpred)? Could this be the reason of my problem? Problems using custom inputformat - Key: HIVE-10234 URL: https://issues.apache.org/jira/browse/HIVE-10234 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.13.1 Environment: CentOS release 6.3 CDH-5.3.1-1.cdh5.3.1.p0.5 hive-0.13.1-cdh5.3.1 Inputformat: https://github.com/twitter/elephant-bird/blob/master/rcfile/src/main/java/com/twitter/elephantbird/mapreduce/input/RCFileProtobufInputFormat.java Reporter: xuhuan Priority: Critical Labels: inputformat I apply the RCFileProtobufInputFormat from twitter's elephantbird project into hive to handle Protobuf records which are written in a RCFile file format. What I have done is belowing: 1. compile the soure code of RCFileProtobufInputFormat and make a jar 2. add the jar to hive_aux_jar_path and update the config of server and client 3. create table using inputformat com.twitter.elephantbird.mapreduce.input.RCFileProtobufInputFormat Creating table is ok.but when I excute a query like select * from table test(there is no data in table test),I got these mistake: FAILED: SemanticException 1:14 Input format must implement InputFormat. Error encountered near token test. I found some cases in stackoverflow,someones said that's because the custom inputformat class implement the new API(org.apache.hadoop.marpreduce) and RCFileProtobufInputFormat class indeed extends the new API. I'm wondering if a custom inputformat must extend the old API(org.apache.hadoop.marpred)? Could this be the reason of my problem? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10146) Not count session as idle if query is running
[ https://issues.apache.org/jira/browse/HIVE-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482546#comment-14482546 ] Lefty Leverenz commented on HIVE-10146: --- Doc note: This adds *hive.server2.idle.session.check.operation* to HiveConf.java, so it needs to be documented in the HiveServer2 section of Configuration Properties for release 1.2.0. It could go either at the end of the section or after *hive.server2.idle.session.timeout* (which could include a reference to it). * [Configuration Properties -- HiveServer2 | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-HiveServer2] * [Configuration Properties -- HiveServer2 -- hive.server2.idle.session.timeout | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.idle.session.timeout] Not count session as idle if query is running - Key: HIVE-10146 URL: https://issues.apache.org/jira/browse/HIVE-10146 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-10146.1.patch, HIVE-10146.2.patch, HIVE-10146.3.patch, HIVE-10146.4.patch Currently, as long as there is no activity, we think the HS2 session is idle. This makes it very hard to set HIVE_SERVER2_IDLE_SESSION_TIMEOUT. If we don't set it long enough, an unattended query could be killed. We should provide an option to not to count the session as idle if some query is still running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10231) Compute partition column stats fails if partition col type is date
[ https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-10231: --- Attachment: HIVE-10231.1.patch Thanks [~ashutoshc] for the review. I made the changes and also add more tests in the new patch, please take a look, thanks. Compute partition column stats fails if partition col type is date -- Key: HIVE-10231 URL: https://issues.apache.org/jira/browse/HIVE-10231 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Fix For: 1.2.0 Attachments: HIVE-10231.1.patch, HIVE-10231.patch Currently the command analyze table .. partition .. compute statistics for columns may only work for partition column type of string, numeric types, but not others like date. See following case using date as partition coltype: {code} create table colstatspartdate (key int, value string) partitioned by (ds date, hr int); insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select key, value from src limit 20; analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute statistics for columns; {code} you will get RuntimeException: {code} FAILED: RuntimeException Cannot convert to Date from: int 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to Date from: int java.lang.RuntimeException: Cannot convert to Date from: int at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10049) Hive jobs submitted from WebHCat won't log Hive history into YARN ATS
[ https://issues.apache.org/jira/browse/HIVE-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyong Zhu resolved HIVE-10049. - Resolution: Duplicate duplicated with https://issues.apache.org/jira/browse/HIVE-10066 Hive jobs submitted from WebHCat won't log Hive history into YARN ATS - Key: HIVE-10049 URL: https://issues.apache.org/jira/browse/HIVE-10049 Project: Hive Issue Type: Improvement Components: Hive, WebHCat Affects Versions: 1.0.0 Reporter: Xiaoyong Zhu Priority: Critical When executed from CLI, the Hive job could be viewed from YARN ATS (/ws/v1/timeline/HIVE_QUERY_ID/). However, if a Hive job is submitted from WebHCat, then no logs will be in YARN ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10231) Compute partition column stats fails if partition col type is date
[ https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482271#comment-14482271 ] Ashutosh Chauhan commented on HIVE-10231: - [~ctang.ma] Since Hive's type system is lucy-gucy you might be able to get away with quoting all the time, but I won't recommend it. My suggestion will be to rather go on other extreme and help parser as much as we can about types. So, generate constants in query like following: {code} switch (colType): case Long: colVal+L; case SmallInt: colVal+S; case Tinyint: colVal+Y; case Decimal: colVal+BD; case String: case Char: case VarChar: ' +colVal+ ' ; case Date: date ' +colVal+ ' ; case TimeStamp: timestamp ' +colVal+ ' ; {code} Compute partition column stats fails if partition col type is date -- Key: HIVE-10231 URL: https://issues.apache.org/jira/browse/HIVE-10231 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Fix For: 1.2.0 Attachments: HIVE-10231.patch Currently the command analyze table .. partition .. compute statistics for columns may only work for partition column type of string, numeric types, but not others like date. See following case using date as partition coltype: {code} create table colstatspartdate (key int, value string) partitioned by (ds date, hr int); insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select key, value from src limit 20; analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute statistics for columns; {code} you will get RuntimeException: {code} FAILED: RuntimeException Cannot convert to Date from: int 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to Date from: int java.lang.RuntimeException: Cannot convert to Date from: int at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10214) log metastore call timing information aggregated at query level
[ https://issues.apache.org/jira/browse/HIVE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10214: - Issue Type: Improvement (was: Bug) log metastore call timing information aggregated at query level --- Key: HIVE-10214 URL: https://issues.apache.org/jira/browse/HIVE-10214 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-10214.1.patch, HIVE-10214.2.patch For troubleshooting issues, it would be useful to log timing information for metastore api calls, aggregated at a query level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)
[ https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482405#comment-14482405 ] Sergey Shelukhin edited comment on HIVE-10161 at 4/7/15 1:57 AM: - When multiple RGs include the same partial CB (due to ORC end boundary being an estimate), the first one reads the length, determines that this is an incomplete CB and bails, the 2nd one tries to blindly read the length but the buffer is now offset by 3 bytes from the original read. Boom! Fixed that, also fixed some small issue with early unlocking. was (Author: sershe): When multiple RGs include the same partial CB (due to ORC end boundary being an estimate), the first one reads the length, determines that this is an incomplete CB and bails, the 2nd one tries to blindly read the length but the compressed block is now offset by 3 bytes from the original read. Boom! Fixed that, also fixed some small issue with early unlocking. LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug) Key: HIVE-10161 URL: https://issues.apache.org/jira/browse/HIVE-10161 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap The EncodedReaderImpl will die when reading from the cache, when reading data written by the regular ORC writer {code} Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 22 more Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780) at org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null] {code} Turning off hive.llap.io.enabled makes the error go away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)
[ https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-10161. - Resolution: Fixed When multiple RGs include the same partial CB (due to ORC end boundary being an estimate), the first one reads the length, determines that this is an incomplete RG and bails, the 2nd one tries to blindly read the length but the compressed block is now offset by 3 bytes from the original read. Boom! Fixed that, also fixed some small issue with early unlocking. LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug) Key: HIVE-10161 URL: https://issues.apache.org/jira/browse/HIVE-10161 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap The EncodedReaderImpl will die when reading from the cache, when reading data written by the regular ORC writer {code} Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 22 more Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780) at org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null] {code} Turning off hive.llap.io.enabled makes the error go away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)
[ https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482405#comment-14482405 ] Sergey Shelukhin edited comment on HIVE-10161 at 4/7/15 1:56 AM: - When multiple RGs include the same partial CB (due to ORC end boundary being an estimate), the first one reads the length, determines that this is an incomplete CB and bails, the 2nd one tries to blindly read the length but the compressed block is now offset by 3 bytes from the original read. Boom! Fixed that, also fixed some small issue with early unlocking. was (Author: sershe): When multiple RGs include the same partial CB (due to ORC end boundary being an estimate), the first one reads the length, determines that this is an incomplete RG and bails, the 2nd one tries to blindly read the length but the compressed block is now offset by 3 bytes from the original read. Boom! Fixed that, also fixed some small issue with early unlocking. LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug) Key: HIVE-10161 URL: https://issues.apache.org/jira/browse/HIVE-10161 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap The EncodedReaderImpl will die when reading from the cache, when reading data written by the regular ORC writer {code} Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 22 more Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780) at org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null] {code} Turning off hive.llap.io.enabled makes the error go away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9937) LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join
[ https://issues.apache.org/jira/browse/HIVE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482413#comment-14482413 ] Hive QA commented on HIVE-9937: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12723438/HIVE-9937.91.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8715 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3300/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3300/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3300/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12723438 - PreCommit-HIVE-TRUNK-Build LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join -- Key: HIVE-9937 URL: https://issues.apache.org/jira/browse/HIVE-9937 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-9937.01.patch, HIVE-9937.02.patch, HIVE-9937.03.patch, HIVE-9937.04.patch, HIVE-9937.05.patch, HIVE-9937.06.patch, HIVE-9937.07.patch, HIVE-9937.08.patch, HIVE-9937.09.patch, HIVE-9937.91.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10226) Column stats for Date columns not supported
[ https://issues.apache.org/jira/browse/HIVE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482310#comment-14482310 ] Hive QA commented on HIVE-10226: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12723434/HIVE-10226.1.patch {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8653 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-parallel_orderby.q-reduce_deduplicate.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-infer_bucket_sort_bucketed_table.q-bucket4.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3299/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3299/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3299/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12723434 - PreCommit-HIVE-TRUNK-Build Column stats for Date columns not supported --- Key: HIVE-10226 URL: https://issues.apache.org/jira/browse/HIVE-10226 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10226.1.patch {noformat} hive explain analyze table revenues compute statistics for columns; 2015-03-30 23:47:45,133 ERROR [main()]: ql.Driver (SessionState.java:printError(951)) - FAILED: UDFArgumentTypeException Only integer/long/timestamp/float/double/string/binary/boolean/decimal type argument is accepted but date is passed. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10214) log metastore call timing information aggregated at query level
[ https://issues.apache.org/jira/browse/HIVE-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10214: - Attachment: HIVE-10214.2.patch addressing review comments in 2.patch log metastore call timing information aggregated at query level --- Key: HIVE-10214 URL: https://issues.apache.org/jira/browse/HIVE-10214 Project: Hive Issue Type: Bug Components: Metastore Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-10214.1.patch, HIVE-10214.2.patch For troubleshooting issues, it would be useful to log timing information for metastore api calls, aggregated at a query level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10225) CLI JLine does not flush history on quit/Ctrl-C
[ https://issues.apache.org/jira/browse/HIVE-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482377#comment-14482377 ] Prasanth Jayachandran commented on HIVE-10225: -- Verified the patch and it works as expected. LGTM, +1. Pending tests. CLI JLine does not flush history on quit/Ctrl-C --- Key: HIVE-10225 URL: https://issues.apache.org/jira/browse/HIVE-10225 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-10225.1.patch Hive CLI is not saving history, if hive cli is terminated using a Ctrl-C or quit;. HIVE-9310 fixed it for the case where one exits with Ctrl-D (EOF), but not for the above ways of exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10189) Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization
[ https://issues.apache.org/jira/browse/HIVE-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reassigned HIVE-10189: --- Assignee: Ferdinand Xu Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization Key: HIVE-10189 URL: https://issues.apache.org/jira/browse/HIVE-10189 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: avx-64.docx We should show the performance gain from SIMD optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10231) Compute partition column stats fails if partition col type is date
[ https://issues.apache.org/jira/browse/HIVE-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482443#comment-14482443 ] Mostafa Mokhtar commented on HIVE-10231: [~jdere] FYI. Compute partition column stats fails if partition col type is date -- Key: HIVE-10231 URL: https://issues.apache.org/jira/browse/HIVE-10231 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Fix For: 1.2.0 Attachments: HIVE-10231.patch Currently the command analyze table .. partition .. compute statistics for columns may only work for partition column type of string, numeric types, but not others like date. See following case using date as partition coltype: {code} create table colstatspartdate (key int, value string) partitioned by (ds date, hr int); insert into colstatspartdate partition (ds=date '2015-04-02', hr=2) select key, value from src limit 20; analyze table colstatspartdate partition (ds=date '2015-04-02', hr=2) compute statistics for columns; {code} you will get RuntimeException: {code} FAILED: RuntimeException Cannot convert to Date from: int 15/04/06 17:30:01 ERROR ql.Driver: FAILED: RuntimeException Cannot convert to Date from: int java.lang.RuntimeException: Cannot convert to Date from: int at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDate(PrimitiveObjectInspectorUtils.java:1048) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DateConverter.convert(PrimitiveObjectInspectorConverter.java:264) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.typeCast(ConstantPropagateProcFactory.java:163) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.propagate(ConstantPropagateProcFactory.java:333) at org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:242) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9223) HiveServer2 on Tez doesn't support concurrent queries within one session
[ https://issues.apache.org/jira/browse/HIVE-9223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482490#comment-14482490 ] Raj Bains commented on HIVE-9223: - Queries within the same session should execute serially. Hue should open a new session when you use a new tab. Each tab can be considered a script where queries run sequentially. Otherwise how can I guarantee that a query creating a temporary table and the one following it that reads the temporary table are processed in order? Are we depending on UI tabs for that? The previous MR behavior seems to be a bug. HiveServer2 on Tez doesn't support concurrent queries within one session Key: HIVE-9223 URL: https://issues.apache.org/jira/browse/HIVE-9223 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Pala M Muthaia When a user submits multiple queries in the same HS2 session (using thrift interface) concurrently, the query goes through the same TezSessionState and ends up being submitted to the same Tez AM, and the second query fails with the error App master already running a DAG Is this by design? I looked into the code, and the comments as well as the code suggest support only for serial execution of queries within the same HiveServer2 session (on tez). This works for CLI environment but in a server, it is plausible that client sends multiple concurrent queries under the same session (e.g: a web app that executes queries for user, such as Cloudera Hue). So shouldn't HS2 on Tez implementation support concurrent queries? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482507#comment-14482507 ] Hive QA commented on HIVE-9647: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12723440/HIVE-9647.02.patch {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8653 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-parallel_orderby.q-reduce_deduplicate.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-infer_bucket_sort_bucketed_table.q-bucket4.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessing org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.TestSQLStdHiveAccessControllerHS2.testConfigProcessingCustomSetWhitelistAppend {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3301/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3301/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3301/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12723440 - PreCommit-HIVE-TRUNK-Build Discrepancy in cardinality estimates between partitioned and un-partitioned tables --- Key: HIVE-9647 URL: https://issues.apache.org/jira/browse/HIVE-9647 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Fix For: 1.2.0 Attachments: HIVE-9647.01.patch, HIVE-9647.02.patch High-level summary HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number of distinct value to estimate join selectivity. The way statistics are aggregated for partitioned tables results in discrepancy in number of distinct values which results in different plans between partitioned and un-partitioned schemas. The table below summarizes the NDVs in computeInnerJoinSelectivity which are used to estimate selectivity of joins. ||Column ||Partitioned count distincts|| Un-Partitioned count distincts |sr_customer_sk |71,245 |1,415,625| |sr_item_sk |38,846|62,562| |sr_ticket_number |71,245 |34,931,085| |ss_customer_sk |88,476|1,415,625| |ss_item_sk |38,846|62,562| |ss_ticket_number|100,756 |56,256,175| The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as select max(NUM_DISTINCTS) from PART_COL_STATS” . This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk. Suggestions Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for HBASE
[jira] [Commented] (HIVE-10160) Give a warning when grouping or ordering by a constant column
[ https://issues.apache.org/jira/browse/HIVE-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482566#comment-14482566 ] Chao commented on HIVE-10160: - +1 on pending test. Give a warning when grouping or ordering by a constant column - Key: HIVE-10160 URL: https://issues.apache.org/jira/browse/HIVE-10160 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Lefty Leverenz Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-10160.1.patch, HIVE-10160.3.patch, HIVE-10160.4.patch To avoid confusion, a warning should be issued when users specify column positions instead of names in a GROUP BY or ORDER BY clause (unless hive.groupby.orderby.position.alias is set to true in Hive 0.11.0 or later). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7271) Speed up unit tests
[ https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482556#comment-14482556 ] Lefty Leverenz commented on HIVE-7271: -- Doc note: *hive.exec.submit.local.task.via.child* is documented in two places in the wiki. * [Unit Test Parallel Execution -- Configuration | https://cwiki.apache.org/confluence/display/Hive/Unit+Test+Parallel+Execution#UnitTestParallelExecution-Configuration] * [Configuration Properties -- Test Properties -- hive.exec.submit.local.task.via.child | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.submit.local.task.via.child] Please review the doc. If it's okay, the TODOC14 label can be removed from this issue. If changes are needed, just let me know. Speed up unit tests --- Key: HIVE-7271 URL: https://issues.apache.org/jira/browse/HIVE-7271 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, HIVE-7271.4.patch, HIVE-7271.5.patch, HIVE-7271.6.patch, HIVE-7271.7.patch Did some experiments to see if there's a way to speed up unit tests. TestCliDriver seemed to take a lot of time just spinning up/tearing down JVMs. I was also curious to see if running everything on a ram disk would help. Results (I ran tests up to authorization_2): - Current setup: 40 minutes - Single JVM (not using child JVM to run all queries): 8 minutes - Single JVM + ram disk: 7 minutes So the ram disk didn't help that much. But running tests in single JVM seems worthwhile doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10120) Disallow create table with dot/colon in column name
[ https://issues.apache.org/jira/browse/HIVE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482581#comment-14482581 ] Swarnim Kulkarni commented on HIVE-10120: - [~pxiong] Looks good for most part. Left some minor comments though on the review. Disallow create table with dot/colon in column name --- Key: HIVE-10120 URL: https://issues.apache.org/jira/browse/HIVE-10120 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-10120.01.patch Since we don't allow users to query column names with dot in the middle such as emp.no, don't allow users to create tables with such columns that cannot be queried. Fix the documentation to reflect this fix. Here is an example. Consider this table: {code} CREATE TABLE a (`emp.no` string); select `emp.no` from a; fails with this message: FAILED: RuntimeException java.lang.RuntimeException: cannot find field emp from [0:emp.no] {code} The hive documentation needs to be fixed: {code} (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) seems to indicate that any Unicode character can go between the backticks in the select statement, but it doesn’t like the dot/colon or even select * when there is a column that has a dot/colon. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10189) Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization
[ https://issues.apache.org/jira/browse/HIVE-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482627#comment-14482627 ] Ferdinand Xu commented on HIVE-10189: - Hi [~chengxiang li], can you help me review this patch? Thank you! Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization Key: HIVE-10189 URL: https://issues.apache.org/jira/browse/HIVE-10189 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-10189.patch, avx-64.docx We should show the performance gain from SIMD optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10119) Allow Log verbosity to be set in hiveserver2 session
[ https://issues.apache.org/jira/browse/HIVE-10119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482573#comment-14482573 ] Hive QA commented on HIVE-10119: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12723492/HIVE-10119.3.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8705 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hive.service.cli.operation.TestOperationLoggingAPI.testFetchResultsOfLogWithExecutionMode org.apache.hive.service.cli.operation.TestOperationLoggingAPI.testFetchResultsOfLogWithNoneMode org.apache.hive.service.cli.operation.TestOperationLoggingAPI.testFetchResultsOfLogWithPerformanceMode org.apache.hive.service.cli.operation.TestOperationLoggingAPI.testFetchResultsOfLogWithVerboseMode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3302/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3302/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3302/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12723492 - PreCommit-HIVE-TRUNK-Build Allow Log verbosity to be set in hiveserver2 session Key: HIVE-10119 URL: https://issues.apache.org/jira/browse/HIVE-10119 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10119.1.patch, HIVE-10119.2.patch, HIVE-10119.3.patch We need to be able to set logging per HS2 session. The client often uses the map-reduce completion matrix (Execution) that shows up in Beeline to debug performance. User might not want the verbose log view all the time since it obfuscates the Execution information. Hence the client should be able to change the verbosity level. Also, there are 2 levels of verbosity at HS2 logging and not 3. The users might want Execution + Performance counters only - so that level needs to be added. So for logs, the user should be able to set 3 levels of verbosity in the session, that will override the default verbosity specified in the hive-site.xml file. 0. None - IGNORE 1. Execution - Just shows the map-reduce tasks completing 2. Performance - Execution + Performance counters dumped at the end 3. Verbose - All logs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10189) Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization
[ https://issues.apache.org/jira/browse/HIVE-10189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-10189: Attachment: HIVE-10189.patch Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMD optimization Key: HIVE-10189 URL: https://issues.apache.org/jira/browse/HIVE-10189 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-10189.patch, avx-64.docx We should show the performance gain from SIMD optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10226) Column stats for Date columns not supported
[ https://issues.apache.org/jira/browse/HIVE-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482557#comment-14482557 ] Swarnim Kulkarni commented on HIVE-10226: - [~jdere] Left minor comments on the review but +1 to Ashutosh's suggestion. That should provide us additional flexibility in future(ex: locales etc). Column stats for Date columns not supported --- Key: HIVE-10226 URL: https://issues.apache.org/jira/browse/HIVE-10226 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10226.1.patch {noformat} hive explain analyze table revenues compute statistics for columns; 2015-03-30 23:47:45,133 ERROR [main()]: ql.Driver (SessionState.java:printError(951)) - FAILED: UDFArgumentTypeException Only integer/long/timestamp/float/double/string/binary/boolean/decimal type argument is accepted but date is passed. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9870) Add JvmPauseMonitor threads to HMS and HS2 daemons
[ https://issues.apache.org/jira/browse/HIVE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-9870: --- Fix Version/s: 1.2.0 Add JvmPauseMonitor threads to HMS and HS2 daemons -- Key: HIVE-9870 URL: https://issues.apache.org/jira/browse/HIVE-9870 Project: Hive Issue Type: Improvement Components: HiveServer2, Metastore Affects Versions: 1.1.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Fix For: 1.2.0 Attachments: HIVE-9870.patch, HIVE-9870.patch, HIVE-9870.patch The hadoop-common carries in it a nifty thread that prints GC or non-GC pauses within the JVM if it exceeds a specific threshold. This has been immeasurably useful in supporting several clusters, in identifying GC or other form of process pauses to be the root cause of some event being investigated. The HMS and HS2 daemons are good targets for running similar threads within it. It can be loaded in an if-available style. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9923) No clear message when from is missing
[ https://issues.apache.org/jira/browse/HIVE-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen reassigned HIVE-9923: -- Assignee: Yongzhi Chen No clear message when from is missing --- Key: HIVE-9923 URL: https://issues.apache.org/jira/browse/HIVE-9923 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Jeff Zhang Assignee: Yongzhi Chen For the following sql, from is missing but it throw NPE which is not clear for user. {code} hive insert overwrite directory '/tmp/hive-3' select sb1.name, sb2.age student_bucketed sb1 join student_bucketed sb2 on sb1.name=sb2.name; FAILED: NullPointerException null {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10202) Beeline outputs prompt+query on standard output when used in non-interactive mode
[ https://issues.apache.org/jira/browse/HIVE-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481253#comment-14481253 ] Naveen Gangam commented on HIVE-10202: -- Thanks [~spena] I made it so intentionally so the customers relying on the current behavior will continue to have an option to do so. In a future release, we could make the default behavior to be silent=true when running in script mode and users can retain old behavior by specifying silent=false. Thoughts? Beeline outputs prompt+query on standard output when used in non-interactive mode - Key: HIVE-10202 URL: https://issues.apache.org/jira/browse/HIVE-10202 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Naveen Gangam Fix For: 1.2.0 Attachments: HIVE-10202.patch When passing a SQL script file to Hive CLI, the prompt+query is not sent to the standard output nor standard error. This is totally fine because users might want to send only the query results to the standard output, and parse the results from it. In the case of BeeLine, the promp+query is sent to the standard output causing extra parsing on the user scripts to avoid reading the prompt+query. Another drawback is in the security side. Sensitive queries are logged directly to the files where the standard output is redirected. How to reproduce: {noformat} $ cat /tmp/query.sql select * from test limit 1; $ beeline --showheader=false --outputformat=tsv2 -u jdbc:hive2://localhost:1 -f /tmp/query.sql /tmp/output.log 2 /tmp/error.log $ cat /tmp/output.log 0: jdbc:hive2://localhost:1 select * . . . . . . . . . . . . . . . . from test . . . . . . . . . . . . . . . . limit 1; 451 451.713 false y2dh7 [866,528,936] 0: jdbc:hive2://localhost:1 {noformat} We should avoid sending the prompt+query to the standard output/error whenever a script file is passed to BeeLine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10206) Improve Alter Table to not initialize Serde unnecessarily
[ https://issues.apache.org/jira/browse/HIVE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481312#comment-14481312 ] Sergio Peña commented on HIVE-10206: Hey [~szehon] Looks good to me. +1 Improve Alter Table to not initialize Serde unnecessarily - Key: HIVE-10206 URL: https://issues.apache.org/jira/browse/HIVE-10206 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 1.1.0 Reporter: Szehon Ho Assignee: Ferdinand Xu Priority: Minor Attachments: HIVE-10206.patch Create an avro table with an external avsc file like: {noformat} CREATE TABLE test(...) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='file:///Users/szehon/Temp/test.avsc', 'kite.compression.type'='snappy', 'transient_lastDdlTime'='1427996456') {noformat} Delete test.avsc file. Try to modify the table properties: {noformat} alter table test set tblproperties ('avro.schema.url'='file:///Users/szehon/Temp/test2.avsc'); {noformat} Will throw an exception like AvroSerdeException: {noformat} at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:119) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:163) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:101) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:78) at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:520) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:377) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:274) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:256) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:595) at org.apache.hadoop.hive.ql.exec.DDLTask.alterTableOrSinglePartition(DDLTask.java:3383) at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3340) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:332) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1037) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10160) Give a warning when grouping or ordering by a constant column
[ https://issues.apache.org/jira/browse/HIVE-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481361#comment-14481361 ] Chao commented on HIVE-10160: - Patch looks good to me. But I'm wondering instead of putting the warning in the log, whether it makes more sense to put it in the shell, so user can know immediately what they need to do to make it work. Give a warning when grouping or ordering by a constant column - Key: HIVE-10160 URL: https://issues.apache.org/jira/browse/HIVE-10160 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Lefty Leverenz Assignee: Yongzhi Chen Priority: Minor Attachments: HIVE-10160.1.patch, HIVE-10160.3.patch To avoid confusion, a warning should be issued when users specify column positions instead of names in a GROUP BY or ORDER BY clause (unless hive.groupby.orderby.position.alias is set to true in Hive 0.11.0 or later). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10206) Improve Alter Table to not initialize Serde unnecessarily
[ https://issues.apache.org/jira/browse/HIVE-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481431#comment-14481431 ] Jimmy Xiang commented on HIVE-10206: +1. Looks good to me too. Improve Alter Table to not initialize Serde unnecessarily - Key: HIVE-10206 URL: https://issues.apache.org/jira/browse/HIVE-10206 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 1.1.0 Reporter: Szehon Ho Assignee: Ferdinand Xu Priority: Minor Attachments: HIVE-10206.patch Create an avro table with an external avsc file like: {noformat} CREATE TABLE test(...) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='file:///Users/szehon/Temp/test.avsc', 'kite.compression.type'='snappy', 'transient_lastDdlTime'='1427996456') {noformat} Delete test.avsc file. Try to modify the table properties: {noformat} alter table test set tblproperties ('avro.schema.url'='file:///Users/szehon/Temp/test2.avsc'); {noformat} Will throw an exception like AvroSerdeException: {noformat} at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:119) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:163) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:101) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:78) at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:520) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:377) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:274) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:256) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:595) at org.apache.hadoop.hive.ql.exec.DDLTask.alterTableOrSinglePartition(DDLTask.java:3383) at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3340) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:332) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1037) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)