[jira] [Commented] (HIVE-7314) Wrong results of UDF when hive.cache.expr.evaluation is set
[ https://issues.apache.org/jira/browse/HIVE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052191#comment-14052191 ] Lefty Leverenz commented on HIVE-7314: -- This bug fix is documented in the wiki with hive.cache.expr.evaluation: * [Configuration Properties -- hive.cache.expr.evaluation | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.cache.expr.evaluation] Wrong results of UDF when hive.cache.expr.evaluation is set --- Key: HIVE-7314 URL: https://issues.apache.org/jira/browse/HIVE-7314 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: dima machlin Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7314.1.patch.txt It seems that the expression caching doesn't work when using UDF inside another UDF or a hive function. For example : tbl has one row : 'a','b' The following query : {code:sql} select concat(custUDF(a),' ', custUDF(b)) from tbl; {code} returns 'a a' seems to cache custUDF(a) and use it for custUDF(b). Same query without the concat works fine. Replacing the concat with another custom UDF also returns 'a a' -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4209) Cache evaluation result of deterministic expression and reuse it
[ https://issues.apache.org/jira/browse/HIVE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052196#comment-14052196 ] Lefty Leverenz commented on HIVE-4209: -- *hive.cache.expr.evaluation* is documented in the wiki here: * [Configuration Properties -- hive.cache.expr.evaluation | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.cache.expr.evaluation] I also added a comment to HIVE-6586 explaining that the current patch for HIVE-6037 (HIVE-6037-0.13.0) truncates the description of *hive.cache.expr.evaluation*. Cache evaluation result of deterministic expression and reuse it Key: HIVE-4209 URL: https://issues.apache.org/jira/browse/HIVE-4209 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4209.6.patch.txt, HIVE-4209.D9585.1.patch, HIVE-4209.D9585.2.patch, HIVE-4209.D9585.3.patch, HIVE-4209.D9585.4.patch, HIVE-4209.D9585.5.patch For example, {noformat} select key from src where key + 1 100 AND key + 1 200 limit 3; {noformat} key + 1 need not to be evaluated twice. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6586) Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)
[ https://issues.apache.org/jira/browse/HIVE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052194#comment-14052194 ] Lefty Leverenz commented on HIVE-6586: -- HIVE-4209 added hive.cache.expr.evaluation in 0.12.0, but patch HIVE-6037-0.13.0 truncates its description to just the first sentence. The full description should be: {quote} If true, the evaluation result of a deterministic expression referenced twice or more will be cached. For example, in a filter condition like .. where key + 10 10 or key + 10 = 0 the expression key + 10 will be evaluated/cached once and reused for the following expression (key + 10 = 0). Currently, this is applied only to expressions in select or filter operators. {quote} Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos) --- Key: HIVE-6586 URL: https://issues.apache.org/jira/browse/HIVE-6586 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Lefty Leverenz Labels: TODOC14 HIVE-6037 puts the definitions of configuration parameters into the HiveConf.java file, but several recent jiras for release 0.13.0 introduce new parameters that aren't in HiveConf.java yet and some parameter definitions need to be altered for 0.13.0. This jira will patch HiveConf.java after HIVE-6037 gets committed. Also, four typos patched in HIVE-6582 need to be fixed in the new HiveConf.java. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7045) Wrong results in multi-table insert aggregating without group by clause
[ https://issues.apache.org/jira/browse/HIVE-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis reassigned HIVE-7045: --- Assignee: Navis Wrong results in multi-table insert aggregating without group by clause --- Key: HIVE-7045 URL: https://issues.apache.org/jira/browse/HIVE-7045 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.12.0 Reporter: dima machlin Assignee: Navis Priority: Blocker Attachments: HIVE-7045.1.patch.txt This happens whenever there are more than 1 reducers. The scenario : CREATE TABLE t1 (a int, b int); CREATE TABLE t2 (cnt int) PARTITIONED BY (var_name string); insert into table t1 select 1,1 from asd limit 1; insert into table t1 select 2,2 from asd limit 1; t1 contains : 1 1 2 2 from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt ; select * from t2; returns : 2 a 2 b as expected. Setting the number of reducers higher than 1 : set mapred.reduce.tasks=2; from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt; select * from t2; 1 a 1 a 1 b 1 b Wrong results. This happens when ever t1 is big enough to automatically generate more than 1 reducers and without specifying it directly. adding group by 1 in the end of each insert solves the problem : from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1 insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 1; generates : 2 a 2 b This should work without the group by... The number of rows for each partition will be the amount of reducers. Each reducer calculated a sub total of the count. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7045) Wrong results in multi-table insert aggregating without group by clause
[ https://issues.apache.org/jira/browse/HIVE-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7045: Attachment: HIVE-7045.1.patch.txt Wrong results in multi-table insert aggregating without group by clause --- Key: HIVE-7045 URL: https://issues.apache.org/jira/browse/HIVE-7045 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.12.0 Reporter: dima machlin Priority: Blocker Attachments: HIVE-7045.1.patch.txt This happens whenever there are more than 1 reducers. The scenario : CREATE TABLE t1 (a int, b int); CREATE TABLE t2 (cnt int) PARTITIONED BY (var_name string); insert into table t1 select 1,1 from asd limit 1; insert into table t1 select 2,2 from asd limit 1; t1 contains : 1 1 2 2 from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt ; select * from t2; returns : 2 a 2 b as expected. Setting the number of reducers higher than 1 : set mapred.reduce.tasks=2; from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt; select * from t2; 1 a 1 a 1 b 1 b Wrong results. This happens when ever t1 is big enough to automatically generate more than 1 reducers and without specifying it directly. adding group by 1 in the end of each insert solves the problem : from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1 insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 1; generates : 2 a 2 b This should work without the group by... The number of rows for each partition will be the amount of reducers. Each reducer calculated a sub total of the count. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7045) Wrong results in multi-table insert aggregating without group by clause
[ https://issues.apache.org/jira/browse/HIVE-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7045: Status: Patch Available (was: Open) Wrong results in multi-table insert aggregating without group by clause --- Key: HIVE-7045 URL: https://issues.apache.org/jira/browse/HIVE-7045 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.10.0 Reporter: dima machlin Priority: Blocker Attachments: HIVE-7045.1.patch.txt This happens whenever there are more than 1 reducers. The scenario : CREATE TABLE t1 (a int, b int); CREATE TABLE t2 (cnt int) PARTITIONED BY (var_name string); insert into table t1 select 1,1 from asd limit 1; insert into table t1 select 2,2 from asd limit 1; t1 contains : 1 1 2 2 from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt ; select * from t2; returns : 2 a 2 b as expected. Setting the number of reducers higher than 1 : set mapred.reduce.tasks=2; from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt; select * from t2; 1 a 1 a 1 b 1 b Wrong results. This happens when ever t1 is big enough to automatically generate more than 1 reducers and without specifying it directly. adding group by 1 in the end of each insert solves the problem : from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1 insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 1; generates : 2 a 2 b This should work without the group by... The number of rows for each partition will be the amount of reducers. Each reducer calculated a sub total of the count. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7257) UDF format_number() does not work on FLOAT types
[ https://issues.apache.org/jira/browse/HIVE-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052216#comment-14052216 ] Lefty Leverenz commented on HIVE-7257: -- I mentioned this bug fix in the wiki here: * [UDFs -- String Functions | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions] UDF format_number() does not work on FLOAT types Key: HIVE-7257 URL: https://issues.apache.org/jira/browse/HIVE-7257 Project: Hive Issue Type: Bug Reporter: Wilbur Yang Assignee: Wilbur Yang Fix For: 0.14.0 Attachments: HIVE-7257.1.patch #1 Show the table: hive describe ssga3; OK sourcestring test float dttimestamp Time taken: 0.243 seconds #2 Run format_number on double and it works: hive select format_number(cast(test as double),2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0009, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0009 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0009 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:14:53,992 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:14:59,032 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:00,046 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:01,056 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec 2014-03-13 17:15:02,067 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.47 sec MapReduce Total cumulative CPU time: 1 seconds 470 msec Ended Job = job_201403131616_0009 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.47 sec HDFS Read: 299 HDFS Write: 10 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 470 msec OK 1.00 2.00 Time taken: 16.563 seconds #3 Run format_number on float and it does not work hive select format_number(test,2) from ssga3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403131616_0010, Tracking URL = http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Kill Command = /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop/bin/hadoop job -kill job_201403131616_0010 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-03-13 17:20:21,158 Stage-1 map = 0%, reduce = 0% 2014-03-13 17:21:00,453 Stage-1 map = 100%, reduce = 100% Ended Job = job_201403131616_0010 with errors Error during job, obtaining debugging information... Job Tracking URL: http://cdh5-1:50030/jobdetails.jsp?jobid=job_201403131616_0010 Examining task ID: task_201403131616_0010_m_02 (and more) from job job_201403131616_0010 Unable to retrieve URL for Hadoop Task logs. Does not contain a valid host:port authority: logicaljt Task with the most failures(4): Task ID: task_201403131616_0010_m_00 Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {source:null,test:1.0,dt:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:141) .. FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7343) Update committer list
[ https://issues.apache.org/jira/browse/HIVE-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052217#comment-14052217 ] Szehon Ho commented on HIVE-7343: - Ah thanks for catching that, +1. I was going to verify with the hive-staging site after commit and before pushing as mentioned in the wiki, but looks good to me. Update committer list - Key: HIVE-7343 URL: https://issues.apache.org/jira/browse/HIVE-7343 Project: Hive Issue Type: Test Components: Documentation Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7343.2.patch, HIVE-7343.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-3227) Implement data loading from user provided string directly for test
[ https://issues.apache.org/jira/browse/HIVE-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3227: Attachment: HIVE-3227.1.patch.txt Implement data loading from user provided string directly for test -- Key: HIVE-3227 URL: https://issues.apache.org/jira/browse/HIVE-3227 Project: Hive Issue Type: Improvement Components: Query Processor, Testing Infrastructure Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3227.1.patch.txt {code} load data instream 'key value\nkey2 value2' into table test; {code} This will make test easier and also can reduce test time. For example, {code} -- ppr_pushdown.q create table ppr_test (key string) partitioned by (ds string); alter table ppr_test add partition (ds = '1234'); insert overwrite table ppr_test partition(ds = '1234') select * from (select '1234' from src limit 1 union all select 'abcd' from src limit 1) s; {code} last query is 4MR job. But can be replaced by {code} create table ppr_test (key string) partitioned by (ds string) ROW FORMAT delimited fields terminated by ' '; alter table ppr_test add partition (ds = '1234'); load data local instream '1234\nabcd' overwrite into table ppr_test partition(ds = '1234'); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3227) Implement data loading from user provided string directly for test
[ https://issues.apache.org/jira/browse/HIVE-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052224#comment-14052224 ] Navis commented on HIVE-3227: - [~appodictic] Restricted this to be used only in hive.in.test=true. How about that? Implement data loading from user provided string directly for test -- Key: HIVE-3227 URL: https://issues.apache.org/jira/browse/HIVE-3227 Project: Hive Issue Type: Improvement Components: Query Processor, Testing Infrastructure Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3227.1.patch.txt {code} load data instream 'key value\nkey2 value2' into table test; {code} This will make test easier and also can reduce test time. For example, {code} -- ppr_pushdown.q create table ppr_test (key string) partitioned by (ds string); alter table ppr_test add partition (ds = '1234'); insert overwrite table ppr_test partition(ds = '1234') select * from (select '1234' from src limit 1 union all select 'abcd' from src limit 1) s; {code} last query is 4MR job. But can be replaced by {code} create table ppr_test (key string) partitioned by (ds string) ROW FORMAT delimited fields terminated by ' '; alter table ppr_test add partition (ds = '1234'); load data local instream '1234\nabcd' overwrite into table ppr_test partition(ds = '1234'); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7325) Support non-constant expressions for MAP type indices.
[ https://issues.apache.org/jira/browse/HIVE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052226#comment-14052226 ] Hive QA commented on HIVE-7325: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12654029/HIVE-7325.1.patch.txt {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5676 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_list_index org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_list_index2 org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/676/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/676/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-676/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12654029 Support non-constant expressions for MAP type indices. -- Key: HIVE-7325 URL: https://issues.apache.org/jira/browse/HIVE-7325 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Mala Chikka Kempanna Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7325.1.patch.txt Here is my sample: {code} CREATE TABLE RECORD(RecordID string, BatchDate string, Country string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,D:BatchDate,D:Country) TBLPROPERTIES (hbase.table.name = RECORD); CREATE TABLE KEY_RECORD(KeyValue String, RecordId mapstring,string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key, K:) TBLPROPERTIES (hbase.table.name = KEY_RECORD); {code} The following join statement doesn't work. {code} SELECT a.*, b.* from KEY_RECORD a join RECORD b WHERE a.RecordId[b.RecordID] is not null; {code} FAILED: SemanticException 2:16 Non-constant expression for map indexes not supported. Error encountered near token 'RecordID' -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6981) Remove old website from SVN
[ https://issues.apache.org/jira/browse/HIVE-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052230#comment-14052230 ] Lefty Leverenz commented on HIVE-6981: -- Good catch, [~szehon]. The whole section needs to be rewritten because we're not using versioned xml docs anymore. Wikidocs are now the official Hive documentation and they don't need to be committed. Following the link to How to Release, I see we also need to delete steps 6, 7, and 8 in the Publishing section (https://cwiki.apache.org/confluence/display/Hive/HowToRelease#HowToRelease-Publishing). Or do those steps cover docs besides the old xml documentation, such as release notes? Maybe we should ask the release managers for recent releases about that. Remove old website from SVN --- Key: HIVE-6981 URL: https://issues.apache.org/jira/browse/HIVE-6981 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: Brock Noland Command to do removal: {noformat} svn delete https://svn.apache.org/repos/asf/hive/site/ --message HIVE-6981 - Remove old website from SVN {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5843) Transaction manager for Hive
[ https://issues.apache.org/jira/browse/HIVE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-5843: - Labels: TODOC13 (was: ) Transaction manager for Hive Key: HIVE-5843 URL: https://issues.apache.org/jira/browse/HIVE-5843 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Alan Gates Assignee: Alan Gates Labels: TODOC13 Fix For: 0.13.0 Attachments: 5843.5-wip.patch, HIVE-5843-src-only.6.patch, HIVE-5843-src-only.patch, HIVE-5843.10.patch, HIVE-5843.2.patch, HIVE-5843.3-src.path, HIVE-5843.3.patch, HIVE-5843.4-src.patch, HIVE-5843.4.patch, HIVE-5843.6.patch, HIVE-5843.7.patch, HIVE-5843.8.patch, HIVE-5843.8.src-only.patch, HIVE-5843.9.patch, HIVE-5843.patch, HiveTransactionManagerDetailedDesign (1).pdf As part of the ACID work proposed in HIVE-5317 a transaction manager is required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-1511: - Labels: TODOC13 (was: ) Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0, 0.11.0 Reporter: Ning Zhang Assignee: Mohammad Kamrul Islam Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-1511-wip.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511.10.patch, HIVE-1511.11.patch, HIVE-1511.12.patch, HIVE-1511.13.patch, HIVE-1511.14.patch, HIVE-1511.16.patch, HIVE-1511.17.patch, HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, HIVE-1511.9.patch, HIVE-1511.patch, HIVE-1511.wip.9.patch, KryoHiveTest.java, failedPlan.xml, generated_plan.xml, run.sh As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7220) Empty dir in external table causes issue (root_dir_external_table.q failure)
[ https://issues.apache.org/jira/browse/HIVE-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7220: Attachment: HIVE-7220.5.patch I couldn't reproduce the dynpart_sort_optimization failure with this patch. As the logs are gone, attaching again to see if I can get some idea. Empty dir in external table causes issue (root_dir_external_table.q failure) Key: HIVE-7220 URL: https://issues.apache.org/jira/browse/HIVE-7220 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7220.2.patch, HIVE-7220.3.patch, HIVE-7220.4.patch, HIVE-7220.5.patch, HIVE-7220.5.patch, HIVE-7220.patch While looking at root_dir_external_table.q failure, which is doing a query on an external table located at root ('/'), I noticed that latest Hadoop2 CombineFileInputFormat returns split representing empty directories (like '/Users'), which leads to failure in Hive's CombineFileRecordReader as it tries to open the directory for processing. Tried with an external table in a normal HDFS directory, and it also returns the same error. Looks like a real bug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-3990) Provide input threshold for direct-fetcher (HIVE-2925)
[ https://issues.apache.org/jira/browse/HIVE-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-3990: - Labels: TODOC13 (was: ) Provide input threshold for direct-fetcher (HIVE-2925) -- Key: HIVE-3990 URL: https://issues.apache.org/jira/browse/HIVE-3990 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Labels: TODOC13 Fix For: 0.13.0 Attachments: D8415.2.patch, D8415.3.patch, HIVE-3990.D8415.1.patch As a followup of HIVE-2925, add input threshold for fetch task conversion. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5692) Make VectorGroupByOperator parameters configurable
[ https://issues.apache.org/jira/browse/HIVE-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-5692: - Labels: TODOC13 (was: ) Make VectorGroupByOperator parameters configurable -- Key: HIVE-5692 URL: https://issues.apache.org/jira/browse/HIVE-5692 Project: Hive Issue Type: Bug Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-5692.1.patch, HIVE-5692.2.patch, HIVE-5692.3.patch, HIVE-5692.4.patch, HIVE-5692.5.patch, HIVE-5692.6.patch The FLUSH_CHECK_THRESHOLD and PERCENT_ENTRIES_TO_FLUSH should be configurable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7326) Hive complains invalid column reference with 'having' aggregate predicates
[ https://issues.apache.org/jira/browse/HIVE-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052253#comment-14052253 ] Hive QA commented on HIVE-7326: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12654033/HIVE-7326.1.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5692 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/678/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/678/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-678/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12654033 Hive complains invalid column reference with 'having' aggregate predicates -- Key: HIVE-7326 URL: https://issues.apache.org/jira/browse/HIVE-7326 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7326.1.patch.txt CREATE TABLE TestV1_Staples ( Item_Count INT, Ship_Priority STRING, Order_Priority STRING, Order_Status STRING, Order_Quantity DOUBLE, Sales_Total DOUBLE, Discount DOUBLE, Tax_Rate DOUBLE, Ship_Mode STRING, Fill_Time DOUBLE, Gross_Profit DOUBLE, Price DOUBLE, Ship_Handle_Cost DOUBLE, Employee_Name STRING, Employee_Dept STRING, Manager_Name STRING, Employee_Yrs_Exp DOUBLE, Employee_Salary DOUBLE, Customer_Name STRING, Customer_State STRING, Call_Center_Region STRING, Customer_Balance DOUBLE, Customer_Segment STRING, Prod_Type1 STRING, Prod_Type2 STRING, Prod_Type3 STRING, Prod_Type4 STRING, Product_Name STRING, Product_Container STRING, Ship_Promo STRING, Supplier_Name STRING, Supplier_Balance DOUBLE, Supplier_Region STRING, Supplier_State STRING, Order_ID STRING, Order_Year INT, Order_Month INT, Order_Day INT, Order_Date_ STRING, Order_Quarter STRING, Product_Base_Margin DOUBLE, Product_ID STRING, Receive_Time DOUBLE, Received_Date_ STRING, Ship_Date_ STRING, Ship_Charge DOUBLE, Total_Cycle_Time DOUBLE, Product_In_Stock STRING, PID INT, Market_Segment STRING ); Query that works: SELECT customer_name, SUM(customer_balance), SUM(order_quantity) FROM default.testv1_staples s1 GROUP BY customer_name HAVING ( (COUNT(s1.discount) = 822) AND (SUM(customer_balance) = 4074689.00041) ); Query that fails: SELECT customer_name, SUM(customer_balance), SUM(order_quantity) FROM default.testv1_staples s1 GROUP BY customer_name HAVING ( (SUM(customer_balance) = 4074689.00041) AND (COUNT(s1.discount) = 822) ); -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-707) add group_concat
[ https://issues.apache.org/jira/browse/HIVE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052301#comment-14052301 ] Jian Wang commented on HIVE-707: [~ph4t] I use this concat_ws(' ', map_keys(UNION_MAP(MAP(your_column, 'dummy' method instead of group_concat,but I got a error like this {code} FAILED: SemanticException [Error 10011]: Line 172:30 Invalid function 'UNION_MAP' {/code} should I add some jars ? add group_concat Key: HIVE-707 URL: https://issues.apache.org/jira/browse/HIVE-707 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Min Zhou Moving the discussion to a new jira: I've implemented group_cat() in a rush, and found something difficult to slove: 1. function group_cat() has a internal order by clause, currently, we can't implement such an aggregation in hive. 2. when the strings will be group concated are too large, in another words, if data skew appears, there is often not enough memory to store such a big result. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7045) Wrong results in multi-table insert aggregating without group by clause
[ https://issues.apache.org/jira/browse/HIVE-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052353#comment-14052353 ] Hive QA commented on HIVE-7045: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12654053/HIVE-7045.1.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5676 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/679/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/679/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-679/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12654053 Wrong results in multi-table insert aggregating without group by clause --- Key: HIVE-7045 URL: https://issues.apache.org/jira/browse/HIVE-7045 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.12.0 Reporter: dima machlin Assignee: Navis Priority: Blocker Attachments: HIVE-7045.1.patch.txt This happens whenever there are more than 1 reducers. The scenario : CREATE TABLE t1 (a int, b int); CREATE TABLE t2 (cnt int) PARTITIONED BY (var_name string); insert into table t1 select 1,1 from asd limit 1; insert into table t1 select 2,2 from asd limit 1; t1 contains : 1 1 2 2 from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt ; select * from t2; returns : 2 a 2 b as expected. Setting the number of reducers higher than 1 : set mapred.reduce.tasks=2; from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt insert overwrite table t2 partition(var_name='b') select count(b) cnt; select * from t2; 1 a 1 a 1 b 1 b Wrong results. This happens when ever t1 is big enough to automatically generate more than 1 reducers and without specifying it directly. adding group by 1 in the end of each insert solves the problem : from t1 insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1 insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 1; generates : 2 a 2 b This should work without the group by... The number of rows for each partition will be the amount of reducers. Each reducer calculated a sub total of the count. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3227) Implement data loading from user provided string directly for test
[ https://issues.apache.org/jira/browse/HIVE-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052444#comment-14052444 ] Hive QA commented on HIVE-3227: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12654059/HIVE-3227.1.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5677 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/680/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/680/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-680/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12654059 Implement data loading from user provided string directly for test -- Key: HIVE-3227 URL: https://issues.apache.org/jira/browse/HIVE-3227 Project: Hive Issue Type: Improvement Components: Query Processor, Testing Infrastructure Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3227.1.patch.txt {code} load data instream 'key value\nkey2 value2' into table test; {code} This will make test easier and also can reduce test time. For example, {code} -- ppr_pushdown.q create table ppr_test (key string) partitioned by (ds string); alter table ppr_test add partition (ds = '1234'); insert overwrite table ppr_test partition(ds = '1234') select * from (select '1234' from src limit 1 union all select 'abcd' from src limit 1) s; {code} last query is 4MR job. But can be replaced by {code} create table ppr_test (key string) partitioned by (ds string) ROW FORMAT delimited fields terminated by ' '; alter table ppr_test add partition (ds = '1234'); load data local instream '1234\nabcd' overwrite into table ppr_test partition(ds = '1234'); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7220) Empty dir in external table causes issue (root_dir_external_table.q failure)
[ https://issues.apache.org/jira/browse/HIVE-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052481#comment-14052481 ] Hive QA commented on HIVE-7220: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12654060/HIVE-7220.5.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5691 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.cli.TestPermsGrp.testCustomPerms org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/681/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/681/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-681/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12654060 Empty dir in external table causes issue (root_dir_external_table.q failure) Key: HIVE-7220 URL: https://issues.apache.org/jira/browse/HIVE-7220 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-7220.2.patch, HIVE-7220.3.patch, HIVE-7220.4.patch, HIVE-7220.5.patch, HIVE-7220.5.patch, HIVE-7220.patch While looking at root_dir_external_table.q failure, which is doing a query on an external table located at root ('/'), I noticed that latest Hadoop2 CombineFileInputFormat returns split representing empty directories (like '/Users'), which leads to failure in Hive's CombineFileRecordReader as it tries to open the directory for processing. Tried with an external table in a normal HDFS directory, and it also returns the same error. Looks like a real bug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7349) Consuming published Hive HCatalog artificats in a Hadoop 2 build environment fails
Venkat Ranganathan created HIVE-7349: Summary: Consuming published Hive HCatalog artificats in a Hadoop 2 build environment fails Key: HIVE-7349 URL: https://issues.apache.org/jira/browse/HIVE-7349 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.0 Reporter: Venkat Ranganathan The published Hive artifacts are built with Hadoop 1 profile. Even though Hive has Hadoop 1 and Hadoop 2 shims, some of the HCatalog Mapreduce classes are still dependent on the compiled environment. For example, using Hive artifacts published in a Sqoop Hcatalog Hadoop 2 build environment results in the following failure Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:104) at org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:84) at org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:73) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:418) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:333) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6172) Whitespaces and comments on Tez
[ https://issues.apache.org/jira/browse/HIVE-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052775#comment-14052775 ] Lefty Leverenz commented on HIVE-6172: -- The wiki documents *hive.jar.directory* and *hive.user.install.directory* here: * [Configuration Properties -- hive.jar.directory | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.jar.directory] * [Configuration Properties -- hive.user.install.directory | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.user.install.directory] Whitespaces and comments on Tez --- Key: HIVE-6172 URL: https://issues.apache.org/jira/browse/HIVE-6172 Project: Hive Issue Type: Bug Affects Versions: tez-branch Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-6172.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)