[jira] [Updated] (HIVE-2597) Repeated key in GROUP BY is erroneously displayed when using DISTINCT
[ https://issues.apache.org/jira/browse/HIVE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-2597: -- Attachment: HIVE-2597.D8967.1.patch navis requested code review of HIVE-2597 [jira] Repeated key in GROUP BY is erroneously displayed when using DISTINCT. Reviewers: JIRA HIVE-2597 Repeated key in GROUP BY is erroneously displayed when using DISTINCT The following query was simplified for illustration purposes. This works correctly: select client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid The intent here is to produce two empty columns in between data. The following query does not work: select distinct client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY The key is not repeated since the aliases were given. Seems like Hive is ignoring the aliases when the distinct keyword is specified. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8967 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/test/queries/clientpositive/groupby_constant.q ql/src/test/results/clientpositive/groupby_constant.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/21711/ To: JIRA, navis Repeated key in GROUP BY is erroneously displayed when using DISTINCT - Key: HIVE-2597 URL: https://issues.apache.org/jira/browse/HIVE-2597 Project: Hive Issue Type: Bug Reporter: Alex Rovner Assignee: Navis Attachments: HIVE-2597.D8967.1.patch The following query was simplified for illustration purposes. This works correctly: select client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid The intent here is to produce two empty columns in between data. The following query does not work: select distinct client_tid, as myvalue1, as myvalue2 from clients cluster by client_tid FAILED: Error in semantic analysis: Line 1:44 Repeated key in GROUP BY The key is not repeated since the aliases were given. Seems like Hive is ignoring the aliases when the distinct keyword is specified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588116#comment-13588116 ] Ashutosh Chauhan commented on HIVE-4080: bq. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Do you mean feature which this patch is introducing (ability to have lead function independent of UDAFs in select expr) will be removed? Consider following query: {noformat} select p_mfgr, p_retailprice, lead(p_retailprice,1) as l1 over (partition by p_mfgr order by p_name), lead(p_retailprice,1, p_retailprice) as l2 over (partition by p_size order by p_name), p_retailprice - lead(p_retailprice,1) from part; {noformat} My guess is ambiguity you are referring to is once we start supporting different partitioning in same query than last lead() in above query becomes ambiguous as to which partitioning function it is refering to. But my understanding is sql standard says that lead and lag function must always be associated with over clause. So, above query is illegal in standard sql. It must be written as: {noformat} select p_mfgr, p_retailprice, lead(p_retailprice,1) as l1 over (partition by p_mfgr order by p_name), lead(p_retailprice,1, p_retailprice) as l2 over (partition by p_size order by p_name), p_retailprice - lead(p_retailprice,1)as l3 over (partition by p_size order by p_name) from part; {noformat} Now we have this concept of default partitioning which would have made first query legal if partitioning scheme was identical for l1 and l2. I think long term: * We should keep functionality introduced in this patch to stay compliant. * Associate default partitioning with windowing function only if there is no ambiguity (i.e., there is only one partitioning clause in query). * Raise error if user doesn't specify partitioning and there are more than one partitioning scheme to choose from. Same argument stands for when lead/lag functions are used as arguments with UDAFs. Make sense ? Further, I think this concept of default partitioning is only extra convenience we are offering to hive users which is non-standard. If it turns out its burdensome to support this I am fine with removing it and always requiring user to specify over clause. Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4085) Incorrectly pruning columns for PTFOperator
Ashutosh Chauhan created HIVE-4085: -- Summary: Incorrectly pruning columns for PTFOperator Key: HIVE-4085 URL: https://issues.apache.org/jira/browse/HIVE-4085 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Ashutosh Chauhan Following simple query used to work before HIVE-4035 {code} select s, sum(b) over (distribute by i sort by si rows between unbounded preceding and current row) from over100k; {code} but now it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4085) Incorrectly pruning columns for PTFOperator
[ https://issues.apache.org/jira/browse/HIVE-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588131#comment-13588131 ] Ashutosh Chauhan commented on HIVE-4085: After HIVE-4035, it is failing with following stack-trace {code} Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:160) ... 14 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col3, 1:_col7] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:143) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57) at org.apache.hadoop.hive.ql.exec.PTFOperator.setupKeysWrapper(PTFOperator.java:193) at org.apache.hadoop.hive.ql.exec.PTFOperator.initializeOp(PTFOperator.java:100) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409) at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:152) ... 14 more {code} Prajakta / Harish, Do you guys already know about this failure? Incorrectly pruning columns for PTFOperator --- Key: HIVE-4085 URL: https://issues.apache.org/jira/browse/HIVE-4085 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Ashutosh Chauhan Following simple query used to work before HIVE-4035 {code} select s, sum(b) over (distribute by i sort by si rows between unbounded preceding and current row) from over100k; {code} but now it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4085) Incorrectly pruning columns for PTFOperator
[ https://issues.apache.org/jira/browse/HIVE-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588137#comment-13588137 ] Ashutosh Chauhan commented on HIVE-4085: Does this has same root cause as HIVE-4083? If so, feel free to mark it as duplicate. Incorrectly pruning columns for PTFOperator --- Key: HIVE-4085 URL: https://issues.apache.org/jira/browse/HIVE-4085 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Ashutosh Chauhan Following simple query used to work before HIVE-4035 {code} select s, sum(b) over (distribute by i sort by si rows between unbounded preceding and current row) from over100k; {code} but now it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-1990) Logging fails due to moved EventCounter class in Hadoop 0.20.100
[ https://issues.apache.org/jira/browse/HIVE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-1990. Resolution: Duplicate Logging fails due to moved EventCounter class in Hadoop 0.20.100 Key: HIVE-1990 URL: https://issues.apache.org/jira/browse/HIVE-1990 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 0.6.0 Environment: Red Hat 2.6.18 Reporter: Joep Rottinghuis Fix For: 0.11.0 Attachments: hive-1990.patch When compiling Hive against Hadoop 0.20.100 logging on command line and in unit tests fails due to the EventCounter class being moved from o.a.h.metrics.jvm.EventCounter to o.a.h.log.EventCounter. {code} [junit] Running org.apache.hadoop.hive.serde2.TestTCTLSeparatedProtocol [junit] log4j:ERROR Could not instantiate class [org.apache.hadoop.metrics.jvm.EventCounter]. [junit] java.lang.ClassNotFoundException: org.apache.hadoop.metrics.jvm.EventCounter [junit] at java.net.URLClassLoader$1.run(URLClassLoader.java:200) [junit] at java.security.AccessController.doPrivileged(Native Method) [junit] at java.net.URLClassLoader.findClass(URLClassLoader.java:188) [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:307) [junit] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) [junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:252) {code} As a note: In order to re-produce I first applied patch as per HIVE-1264 to 0.6 branch in order to resolve jar naming issues in build. Then I locally modified the build.properties to my locally built 0.20.100 Hadoop build: {code} hadoop.security.url=file:.../hadoop/core/hadoop-${hadoop.version} hadoop.security.version=${hadoop.version} {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HIVE-3886) WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated
[ https://issues.apache.org/jira/browse/HIVE-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reopened HIVE-3886: WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated - Key: HIVE-3886 URL: https://issues.apache.org/jira/browse/HIVE-3886 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.9.0, 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Minor Fix For: 0.11.0 Attachments: HIVE-3886.1.patch WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3886) WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated
[ https://issues.apache.org/jira/browse/HIVE-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-3886. Resolution: Duplicate WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated - Key: HIVE-3886 URL: https://issues.apache.org/jira/browse/HIVE-3886 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.9.0, 0.10.0, 0.11.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Priority: Minor Fix For: 0.11.0 Attachments: HIVE-3886.1.patch WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3952) merge map-job followed by map-reduce job
[ https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588233#comment-13588233 ] Amareshwari Sriramadasu commented on HIVE-3952: --- Tried out the patch, when we run query like the following : INSERT OVERWRITE DIRECTORY /dir Select It fails with exception : {noformat} java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.MoveTask cannot be cast to org.apache.hadoop.hive.ql.exec.MapRedTask at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.mayBeMergeMapJoinTaskWithMapReduceTask(CommonJoinResolver.java:291) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.processCurrentTask(CommonJoinResolver.java:535) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver$CommonJoinTaskDispatcher.dispatch(CommonJoinResolver.java:701) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:113) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8138) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8470) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:259) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:898) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:197) {noformat} merge map-job followed by map-reduce job Key: HIVE-3952 URL: https://issues.apache.org/jira/browse/HIVE-3952 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Vinod Kumar Vavilapalli Attachments: HIVE-3952-20130226.txt Consider the query like: select count(*) FROM ( select idOne, idTwo, value FROM bigTable JOIN smallTableOne on (bigTable.idOne = smallTableOne.idOne) ) firstjoin JOIN smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo); where smallTableOne and smallTableTwo are smaller than hive.auto.convert.join.noconditionaltask.size and hive.auto.convert.join.noconditionaltask is set to true. The joins are collapsed into mapjoins, and it leads to a map-only job (for the map-joins) followed by a map-reduce job (for the group by). Ideally, the map-only job should be merged with the following map-reduce job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3428) Fix log4j configuration errors when running hive on hadoop23
[ https://issues.apache.org/jira/browse/HIVE-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588405#comment-13588405 ] Hudson commented on HIVE-3428: -- Integrated in Hive-trunk-h0.21 #1989 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1989/]) HIVE-3428 : Fix log4j configuration errors when running hive on hadoop23 (Gunther Hagleitner via Ashutosh Chauhan) (Revision 1450645) Result = SUCCESS hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1450645 Files : * /hive/trunk/common/src/java/conf/hive-log4j.properties * /hive/trunk/data/conf/hive-log4j.properties * /hive/trunk/pdk/scripts/conf/log4j.properties * /hive/trunk/ql/src/java/conf/hive-exec-log4j.properties * /hive/trunk/shims/ivy.xml * /hive/trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HiveEventCounter.java * /hive/trunk/shims/src/common/java/org/apache/hadoop/hive/shims/ShimLoader.java Fix log4j configuration errors when running hive on hadoop23 Key: HIVE-3428 URL: https://issues.apache.org/jira/browse/HIVE-3428 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Gunther Hagleitner Fix For: 0.11.0 Attachments: HIVE-3428.1.D8805.patch, HIVE-3428.1.patch.txt, HIVE-3428.2.patch.txt, HIVE-3428.3.patch.txt, HIVE-3428.4.patch.txt, HIVE-3428.5.patch.txt, HIVE-3428.6.patch.txt, HIVE-3428_SHIM_EVENT_COUNTER.patch There are log4j configuration errors when running hive on hadoop23, some of them may fail testcases, since the following log4j error message could printed to console, or to output file, which diffs from the expected output: [junit] log4j:ERROR Could not find value for key log4j.appender.NullAppender [junit] log4j:ERROR Could not instantiate appender named NullAppender. [junit] 12/09/04 11:34:42 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Issue involving non-sun java build
Hello, I've been working with Hive for a couple of months now, and although it is a pretty strong framework, I can't help but noticing that some testcases fails on Non-Sun Java. Some examples are: TestCliDriver, TestParse and TestJdbcDriver. The issues involve HashMap situation, where Sun Java has a different output order than Non-Sun Java. It is a silly problem, but it does cause failures. I've been working on fixes for theses problems, and was planning to contribute it. Is this something the Hive community would be interested in? What are your thoughts about that? Thanks, Renata.
[jira] [Updated] (HIVE-2264) Hive server is SHUTTING DOWN when invalid queries beeing executed.
[ https://issues.apache.org/jira/browse/HIVE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-2264: --- Attachment: HIVE-2264-2.patch Hi, I rebased this patch on trunk (attached) and removed the commented out System.exit(). Brock Hive server is SHUTTING DOWN when invalid queries beeing executed. -- Key: HIVE-2264 URL: https://issues.apache.org/jira/browse/HIVE-2264 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0 Environment: SuSE-Linux-11 Reporter: rohithsharma Assignee: Navis Priority: Critical Attachments: HIVE-2264.1.patch.txt, HIVE-2264-2.patch When invalid query is beeing executed, Hive server is shutting down. {noformat} CREATE TABLE SAMPLETABLE(IP STRING , showtime BIGINT ) partitioned by (ds string,ipz int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040' ALTER TABLE SAMPLETABLE add Partition(ds='sf') location '/user/hive/warehouse' Partition(ipz=100) location '/user/hive/warehouse' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2264) Hive server is SHUTTING DOWN when invalid queries beeing executed.
[ https://issues.apache.org/jira/browse/HIVE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588484#comment-13588484 ] Brock Noland commented on HIVE-2264: Navis, you can use my rebased patch to update review board or if you don't have interest in this any longer, no worries, I'd be willing to take it up. Hive server is SHUTTING DOWN when invalid queries beeing executed. -- Key: HIVE-2264 URL: https://issues.apache.org/jira/browse/HIVE-2264 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0 Environment: SuSE-Linux-11 Reporter: rohithsharma Assignee: Navis Priority: Critical Attachments: HIVE-2264.1.patch.txt, HIVE-2264-2.patch When invalid query is beeing executed, Hive server is shutting down. {noformat} CREATE TABLE SAMPLETABLE(IP STRING , showtime BIGINT ) partitioned by (ds string,ipz int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040' ALTER TABLE SAMPLETABLE add Partition(ds='sf') location '/user/hive/warehouse' Partition(ipz=100) location '/user/hive/warehouse' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype
[ https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588503#comment-13588503 ] Arun A K commented on HIVE-3850: Hello [~analog.sony], the test cases are ok. I think this need to be considered for commit. [~ajeshpg] had given me the link to raise the review request. I tried raising the review, but I am getting an error message The selected file does not appear to be a diff. If possible could you create a new patch with the name HIVE-3850.1.patch ? Either [~710154] or yourself can do that so that we can edit the current review request (https://reviews.apache.org/r/9171/) or discard this and create a new one. hour() function returns 12 hour clock value when using timestamp datatype - Key: HIVE-3850 URL: https://issues.apache.org/jira/browse/HIVE-3850 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.9.0, 0.10.0 Reporter: Pieterjan Vriends Fix For: 0.11.0 Attachments: hive-3850.patch, HIVE-3850.patch.txt Apparently UDFHour.java does have two evaluate() functions. One that does accept a Text object as parameter and one that does use a TimeStampWritable object as parameter. The first function does return the value of Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the documentation I couldn't find any information on the overload of the evaluation function. I did spent quite some time finding out why my statement didn't return a 24 hour clock value. Shouldn't both functions return the same? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588515#comment-13588515 ] Phabricator commented on HIVE-4080: --- ashutoshc has requested changes to the revision HIVE-4080 [jira] Add Lead Lag UDAFs. Some comments. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:533 Use LEAD_FUNC_NAME and LAG_FUNC_NAME here to be consistent. ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:871 This null check should be done outside of nesting if() block. ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:875 I don't see a case where fInfo is != null but udafResolver could be. If so, than this null check is redundant and should be removed. ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:890 We are already know that we are dealing with lead/lag udaf, it must be of type GenericUDAFResolver2.. no ? ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:866 I am wondering if this function can be rewritten as following: WindowFunctionInfo finfo = windowFunctions.get(name.toLowerCase()); if (finfo == null) { return null;} f ( !name.toLowerCase().equals(LEAD_FUNC_NAME) !name.toLowerCase().equals(LAG_FUNC_NAME) ) { return getGenericUDAFEvaluator(name, argumentOIs, isDistinct, isAllColumns); } // this must be lead/lag UDAF GenericUDAFResolver udafResolver = finfo.getfInfo().getGenericUDAFResolver(); GenericUDAFParameterInfo paramInfo = new SimpleGenericUDAFParameterInfo( argumentOIs.toArray(), isDistinct, isAllColumns); return ((GenericUDAFResolver2) udafResolver).getEvaluator(paramInfo); If not, than I have specific questions. See next few comments. ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:1467 It will be good to comments for this new boolean registerAsUDAF. Something like following There are certain UDAFs like lead/lag which we want as windowing functions, but don't want them to appear in mFunctions. Why? Because ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java:29 This and GenericUDAFLead share lots of common code. It might be good to have an abstract class for these two, just the way you have it in GenericUDFLeadLag. ql/src/test/queries/clientpositive/leadlag_queries.q:20 I think currently we don't support over clause on expressions. Once we do, it will be good to add test like: select p_retailprice - lead (p_retail) over (partition by p_mfgr) from part; ql/src/test/queries/clientpositive/leadlag_queries.q:35 It will be good to add a test which has both lead and lag in same query. REVISION DETAIL https://reviews.facebook.net/D8961 BRANCH HIVE-4080 ARCANIST PROJECT hive To: JIRA, ashutoshc, hbutani Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Issue involving non-sun java build
Hi, I think we'd want hive to work on as many JVM's as feasible. With that said, since it's tested mostly on Sun JVM, it's possible we'll introduce new issues in the future so you'll need to keep testing. Here is a guide on how to contribute: https://cwiki.apache.org/confluence/display/Hive/HowToContribute Glad to have you interested! Brock On Wed, Feb 27, 2013 at 10:06 AM, Renata Ghisloti Duarte de Souza rgdua...@linux.vnet.ibm.com wrote: Hello, I've been working with Hive for a couple of months now, and although it is a pretty strong framework, I can't help but noticing that some testcases fails on Non-Sun Java. Some examples are: TestCliDriver, TestParse and TestJdbcDriver. The issues involve HashMap situation, where Sun Java has a different output order than Non-Sun Java. It is a silly problem, but it does cause failures. I've been working on fixes for theses problems, and was planning to contribute it. Is this something the Hive community would be interested in? What are your thoughts about that? Thanks, Renata. -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588560#comment-13588560 ] Harish Butani commented on HIVE-4080: - Yes, there are several related issues: 1. Lead/Lag as UDAFs - this jira only addresses this - will work on your comments. 2. Support expressions with over clause - filed JIRA 4081 for this - will work on this next 3. Support for lead/lag UDFs. Based on our offline conversation and as you point out here the options are: - should we continue to support - should we completely remove support? - support lead/lag as UDFs, but only within argument expressions of other UDAFs. The consensus seems to be option 3 is nice to have; 1 is problematic. Will address this in a separate JIRA 4. The notion of default partitions - you have given more proof, why supporting lead/lag as UDFs generally (option 1) is problematic. - in general, should we continue to support this? - Your approach, makes sense Will address this in separate JIRA. Does this break down of issues make sense? Will address the first 3 asap; and then work on supporting multiple partitions(4041). The 4th one will have to wait a bit. Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4086) Cross Database Support for Indexes and Partitions (or all DDL statements)
Todd Wilson created HIVE-4086: - Summary: Cross Database Support for Indexes and Partitions (or all DDL statements) Key: HIVE-4086 URL: https://issues.apache.org/jira/browse/HIVE-4086 Project: Hive Issue Type: Improvement Components: Database/Schema, ODBC, SQL Affects Versions: 0.9.0 Environment: Writing a query tool in .NET connecting with ODBC to Hadoop on Linux. Reporter: Todd Wilson I'd like to see more cross-database support. I'm using a Cloudera implementation on Hive .9. Currently, you can create new databases, you can create tables and views in those databases, but you cannot create indexes or partitions on those tables. Likewise, commands like show partitions or show indexes will only work on table in the default database. This would probably also affect statements like Alter Table and Recover Partitions. Probably also something like Create Function, but if you want to keep all functions being created in the default database that would work. I would be more interested in full cross-database support for tables and views to start. Functions for example could all be created in default. Thank you. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588612#comment-13588612 ] Phabricator commented on HIVE-4080: --- hbutani has commented on the revision HIVE-4080 [jira] Add Lead Lag UDAFs. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:866 Yes, i wanted to introduce a new fn to take a GenericUDAFResolver: public static GenericUDAFEvaluator getGenericUDAFEvaluator(GenericUDAFResolver, ListObjectInspector argumentOIs, boolean isDistinct, boolean isAllColumns) and have the current getGenericUDAFEvaluator and getGenericWindowingEvaluator call it. But backed out, because was not comfortable making this change and submitting the patch w/o running the entire test suite. Ended up just doing a cut and paste. Your soln is much better ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:1467 Yes, meant to do this. Somehow forgot, sorry ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java:29 yes, i was rushing this... should refactor it. ql/src/test/queries/clientpositive/leadlag_queries.q:20 yes exactly ql/src/test/queries/clientpositive/leadlag_queries.q:35 will add REVISION DETAIL https://reviews.facebook.net/D8961 BRANCH HIVE-4080 ARCANIST PROJECT hive To: JIRA, ashutoshc, hbutani Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4086) Cross Database Support for Indexes and Partitions (or all DDL statements)
[ https://issues.apache.org/jira/browse/HIVE-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588626#comment-13588626 ] Jarek Jarcec Cecho commented on HIVE-4086: -- Hi Todd, thank you very much for reporting this issue. We already have JIRA HIVE-4064 to track this requirement so I'm closing this one as a duplicate to keep all the information in one place. To immediately unblock you, did you consider using SQL query {{USE dbname}} to change working database from {{default}} to {{dbname}}? Jarcec Cross Database Support for Indexes and Partitions (or all DDL statements) - Key: HIVE-4086 URL: https://issues.apache.org/jira/browse/HIVE-4086 Project: Hive Issue Type: Improvement Components: Database/Schema, ODBC, SQL Affects Versions: 0.9.0 Environment: Writing a query tool in .NET connecting with ODBC to Hadoop on Linux. Reporter: Todd Wilson I'd like to see more cross-database support. I'm using a Cloudera implementation on Hive .9. Currently, you can create new databases, you can create tables and views in those databases, but you cannot create indexes or partitions on those tables. Likewise, commands like show partitions or show indexes will only work on table in the default database. This would probably also affect statements like Alter Table and Recover Partitions. Probably also something like Create Function, but if you want to keep all functions being created in the default database that would work. I would be more interested in full cross-database support for tables and views to start. Functions for example could all be created in default. Thank you. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-4086) Cross Database Support for Indexes and Partitions (or all DDL statements)
[ https://issues.apache.org/jira/browse/HIVE-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho resolved HIVE-4086. -- Resolution: Duplicate Cross Database Support for Indexes and Partitions (or all DDL statements) - Key: HIVE-4086 URL: https://issues.apache.org/jira/browse/HIVE-4086 Project: Hive Issue Type: Improvement Components: Database/Schema, ODBC, SQL Affects Versions: 0.9.0 Environment: Writing a query tool in .NET connecting with ODBC to Hadoop on Linux. Reporter: Todd Wilson I'd like to see more cross-database support. I'm using a Cloudera implementation on Hive .9. Currently, you can create new databases, you can create tables and views in those databases, but you cannot create indexes or partitions on those tables. Likewise, commands like show partitions or show indexes will only work on table in the default database. This would probably also affect statements like Alter Table and Recover Partitions. Probably also something like Create Function, but if you want to keep all functions being created in the default database that would work. I would be more interested in full cross-database support for tables and views to start. Functions for example could all be created in default. Thank you. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements
[ https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588639#comment-13588639 ] Shreepadma Venugopalan commented on HIVE-4064: -- I believe there is a problem with a number of DDLs including ALTER TABLE, CREATE INDEX. Handle db qualified names consistently across all HiveQL statements --- Key: HIVE-4064 URL: https://issues.apache.org/jira/browse/HIVE-4064 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Hive doesn't consistently handle db qualified names across all HiveQL statements. While some HiveQL statements such as SELECT support DB qualified names, other such as CREATE INDEX doesn't. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Issue involving non-sun java build
Hi Renata, I'm glad to see you interested in contributing to Hive community! Please don't hesitate and follow the link provided by Brock in previous email if you're interested. I just wanted to add that Hive is a Hadoop SQL Engine and thus we have major dependencies on Hadoop. Whereas extending Hive ability to work on other JDKs is definitely a great thing to do, I feel the need to warn that you might get into issues as Hadoop itself might not work on those JDKs. I know about following wiki page [1] that describes the support of various JDKs in Hadoop, but it seems not maintained any more. Jarcec Links: 1: http://wiki.apache.org/hadoop/HadoopJavaVersions On Wed, Feb 27, 2013 at 11:01:58AM -0600, Brock Noland wrote: Hi, I think we'd want hive to work on as many JVM's as feasible. With that said, since it's tested mostly on Sun JVM, it's possible we'll introduce new issues in the future so you'll need to keep testing. Here is a guide on how to contribute: https://cwiki.apache.org/confluence/display/Hive/HowToContribute Glad to have you interested! Brock On Wed, Feb 27, 2013 at 10:06 AM, Renata Ghisloti Duarte de Souza rgdua...@linux.vnet.ibm.com wrote: Hello, I've been working with Hive for a couple of months now, and although it is a pretty strong framework, I can't help but noticing that some testcases fails on Non-Sun Java. Some examples are: TestCliDriver, TestParse and TestJdbcDriver. The issues involve HashMap situation, where Sun Java has a different output order than Non-Sun Java. It is a silly problem, but it does cause failures. I've been working on fixes for theses problems, and was planning to contribute it. Is this something the Hive community would be interested in? What are your thoughts about that? Thanks, Renata. -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ signature.asc Description: Digital signature
[jira] [Commented] (HIVE-4086) Cross Database Support for Indexes and Partitions (or all DDL statements)
[ https://issues.apache.org/jira/browse/HIVE-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588655#comment-13588655 ] Todd Wilson commented on HIVE-4086: --- Hello Jarcec: Thank you for the reply. I appreciate this. I figured you might have something like this, but I couldn't find it. It was my first time entering an issue so I was assuming I'd do something wrong! As far as the USE command goes, I'll give that a try. I actually didn't realize this command was supported. That would help a lot in what I'm trying to do. I'm switching back between a lot of data sources like Teradata, ParAccel, Kognitio so sometimes my brain gets scrambled. :p Thank you again. Best Regards, Todd Wilson Senior Technical Consultant Coffing Data Warehousing (513) 292-3158 www.CoffingDW.com The information contained in this communication is confidential, private, proprietary, or otherwise privileged and is intended only for the use of the addressee. Unauthorized use, disclosure, distribution or copying is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately at gene...@coffingdw.com. * Cross Database Support for Indexes and Partitions (or all DDL statements) - Key: HIVE-4086 URL: https://issues.apache.org/jira/browse/HIVE-4086 Project: Hive Issue Type: Improvement Components: Database/Schema, ODBC, SQL Affects Versions: 0.9.0 Environment: Writing a query tool in .NET connecting with ODBC to Hadoop on Linux. Reporter: Todd Wilson I'd like to see more cross-database support. I'm using a Cloudera implementation on Hive .9. Currently, you can create new databases, you can create tables and views in those databases, but you cannot create indexes or partitions on those tables. Likewise, commands like show partitions or show indexes will only work on table in the default database. This would probably also affect statements like Alter Table and Recover Partitions. Probably also something like Create Function, but if you want to keep all functions being created in the default database that would work. I would be more interested in full cross-database support for tables and views to start. Functions for example could all be created in default. Thank you. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job
[ https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HIVE-3952: -- Attachment: HIVE-3952-20130227.1.txt Thanks for trying this, Amareshwari! I've added your INSERT OVERWRITE DIRECTORY /dir Select case to the test. Here's an updated patch that should work for you, can you please try again? Tx. merge map-job followed by map-reduce job Key: HIVE-3952 URL: https://issues.apache.org/jira/browse/HIVE-3952 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Vinod Kumar Vavilapalli Attachments: HIVE-3952-20130226.txt, HIVE-3952-20130227.1.txt Consider the query like: select count(*) FROM ( select idOne, idTwo, value FROM bigTable JOIN smallTableOne on (bigTable.idOne = smallTableOne.idOne) ) firstjoin JOIN smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo); where smallTableOne and smallTableTwo are smaller than hive.auto.convert.join.noconditionaltask.size and hive.auto.convert.join.noconditionaltask is set to true. The joins are collapsed into mapjoins, and it leads to a map-only job (for the map-joins) followed by a map-reduce job (for the group by). Ideally, the map-only job should be merged with the following map-reduce job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4079) Altering a view partition fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588685#comment-13588685 ] Kevin Wilfong commented on HIVE-4079: - Tests pass. Altering a view partition fails with NPE Key: HIVE-4079 URL: https://issues.apache.org/jira/browse/HIVE-4079 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4079.1.patch.txt Altering a view partition e.g. to add partition parameters, fails with a null pointer exception in the ObjectStore class. Currently, this is only possible using the metastore Thrift API and there are no testcases for it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4087) Annotate public interfaces (UD*F, storage handler, SerDe)
Gunther Hagleitner created HIVE-4087: Summary: Annotate public interfaces (UD*F, storage handler, SerDe) Key: HIVE-4087 URL: https://issues.apache.org/jira/browse/HIVE-4087 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Going forward it would be nice to clearly annotate public interfaces in the hive codebase. The javadocs would be more useful that way. It might even make sense to produce documentation for just those interfaces. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4088) Landing page for previous versions of documentation
Gunther Hagleitner created HIVE-4088: Summary: Landing page for previous versions of documentation Key: HIVE-4088 URL: https://issues.apache.org/jira/browse/HIVE-4088 Project: Hive Issue Type: Improvement Reporter: Gunther Hagleitner If you go to: http://hive.apache.org/releases.html And navigate to documentation for previous releases you end up on a page like this: http://hive.apache.org/docs/r0.7.1/ It would be great to have an actual page there instead of a directory listing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: jdo2-api dependency
https://issues.apache.org/jira/browse/HIVE-4089 On Feb 22, 2013, at 3:15 PM, Jarek Jarcec Cecho jar...@apache.org wrote: Hi Nitay, would you mind opening a JIRA for that? Jarcec On Fri, Feb 22, 2013 at 01:03:15PM -0500, Nitay Joffe wrote: Hey guys, The latest open source hive release (0.10.0) depends on javax.jdo artifact jdo2-api version 2.3-ec. This version is not actually in maven central, which means everyone who uses hive requires custom maven repository definitions which is discouraged by maven folks. I pinged the javax.jdo guys about it and they recommended we upgrade to 3.0. See http://mail-archives.apache.org/mod_mbox/db-jdo-dev/201302.mbox/%3CCAGZB7RguuEJnpVbtaqOgYEbsUNzP3aMSmM8SM8aOxcb-hLWwjg%40mail.gmail.com%3E for the conversation. Can you guys fix this? Thanks, - Nitay
[jira] [Created] (HIVE-4089) javax.jdo : jdo2-api dependency not in Maven Central
Nitay Joffe created HIVE-4089: - Summary: javax.jdo : jdo2-api dependency not in Maven Central Key: HIVE-4089 URL: https://issues.apache.org/jira/browse/HIVE-4089 Project: Hive Issue Type: Bug Reporter: Nitay Joffe Assignee: Jarek Jarcec Cecho The latest open source hive release (0.10.0) depends on javax.jdo artifact jdo2-api version 2.3-ec. This version is not actually in maven central, which means everyone who uses hive requires custom maven repository definitions which is discouraged by maven folks. I pinged the javax.jdo guys about it and they recommended we upgrade to 3.0. See goo.gl/fAoRn for the conversation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4089) javax.jdo : jdo2-api dependency not in Maven Central
[ https://issues.apache.org/jira/browse/HIVE-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588728#comment-13588728 ] Nitay Joffe commented on HIVE-4089: --- Link: goo.gl/fAoRn javax.jdo : jdo2-api dependency not in Maven Central Key: HIVE-4089 URL: https://issues.apache.org/jira/browse/HIVE-4089 Project: Hive Issue Type: Bug Reporter: Nitay Joffe Assignee: Jarek Jarcec Cecho The latest open source hive release (0.10.0) depends on javax.jdo artifact jdo2-api version 2.3-ec. This version is not actually in maven central, which means everyone who uses hive requires custom maven repository definitions which is discouraged by maven folks. I pinged the javax.jdo guys about it and they recommended we upgrade to 3.0. See goo.gl/fAoRn for the conversation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4089) javax.jdo : jdo2-api dependency not in Maven Central
[ https://issues.apache.org/jira/browse/HIVE-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588729#comment-13588729 ] Nitay Joffe commented on HIVE-4089: --- http://mail-archives.apache.org/mod_mbox/db-jdo-dev/201302.mbox/%3ccagzb7rguuejnpvbtaqogyebsunzp3amsmm8sm8aoxcb-hlw...@mail.gmail.com%3E javax.jdo : jdo2-api dependency not in Maven Central Key: HIVE-4089 URL: https://issues.apache.org/jira/browse/HIVE-4089 Project: Hive Issue Type: Bug Reporter: Nitay Joffe Assignee: Jarek Jarcec Cecho The latest open source hive release (0.10.0) depends on javax.jdo artifact jdo2-api version 2.3-ec. This version is not actually in maven central, which means everyone who uses hive requires custom maven repository definitions which is discouraged by maven folks. I pinged the javax.jdo guys about it and they recommended we upgrade to 3.0. See goo.gl/fAoRn for the conversation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4089) javax.jdo : jdo2-api dependency not in Maven Central
[ https://issues.apache.org/jira/browse/HIVE-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nitay Joffe updated HIVE-4089: -- Description: The latest open source hive release (0.10.0) depends on javax.jdo artifact jdo2-api version 2.3-ec. This version is not actually in maven central, which means everyone who uses hive requires custom maven repository definitions which is discouraged by maven folks. I pinged the javax.jdo guys about it and they recommended we upgrade to 3.0. See http://goo.gl/fAoRn for the conversation. (was: The latest open source hive release (0.10.0) depends on javax.jdo artifact jdo2-api version 2.3-ec. This version is not actually in maven central, which means everyone who uses hive requires custom maven repository definitions which is discouraged by maven folks. I pinged the javax.jdo guys about it and they recommended we upgrade to 3.0. See goo.gl/fAoRn for the conversation.) javax.jdo : jdo2-api dependency not in Maven Central Key: HIVE-4089 URL: https://issues.apache.org/jira/browse/HIVE-4089 Project: Hive Issue Type: Bug Reporter: Nitay Joffe Assignee: Jarek Jarcec Cecho The latest open source hive release (0.10.0) depends on javax.jdo artifact jdo2-api version 2.3-ec. This version is not actually in maven central, which means everyone who uses hive requires custom maven repository definitions which is discouraged by maven folks. I pinged the javax.jdo guys about it and they recommended we upgrade to 3.0. See http://goo.gl/fAoRn for the conversation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4078) Remove the serialize-deserialize pair in CommonJoinResolver
[ https://issues.apache.org/jira/browse/HIVE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4078: -- Status: Open (was: Patch Available) Updating patch to match review comments. Remove the serialize-deserialize pair in CommonJoinResolver --- Key: HIVE-4078 URL: https://issues.apache.org/jira/browse/HIVE-4078 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-4078.patch CommonJoinProcessor tries to clone a MapredWork while attempting a conversion to a map-join {code} // deep copy a new mapred work from xml InputStream in = new ByteArrayInputStream(xml.getBytes(UTF-8)); MapredWork newWork = Utilities.deserializeMapRedWork(in, physicalContext.getConf()); {code} which is a very heavy operation memory wise cpu-wise. Instead of cloning via XMLEncoder, it is faster to use BeanUtils.cloneBean() which is following same data paths (get/set bean methods) instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4078) Remove the serialize-deserialize pair in CommonJoinResolver
[ https://issues.apache.org/jira/browse/HIVE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4078: -- Attachment: HIVE-4078-20130227.patch Updated to throw an exception if the cloner throws a SemanticException wrapped around the IllegalAccess or Invocation exception that are possible (but unlikely). Remove the serialize-deserialize pair in CommonJoinResolver --- Key: HIVE-4078 URL: https://issues.apache.org/jira/browse/HIVE-4078 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-4078-20130227.patch, HIVE-4078.patch CommonJoinProcessor tries to clone a MapredWork while attempting a conversion to a map-join {code} // deep copy a new mapred work from xml InputStream in = new ByteArrayInputStream(xml.getBytes(UTF-8)); MapredWork newWork = Utilities.deserializeMapRedWork(in, physicalContext.getConf()); {code} which is a very heavy operation memory wise cpu-wise. Instead of cloning via XMLEncoder, it is faster to use BeanUtils.cloneBean() which is following same data paths (get/set bean methods) instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4078) Remove the serialize-deserialize pair in CommonJoinResolver
[ https://issues.apache.org/jira/browse/HIVE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4078: -- Status: Patch Available (was: Open) Remove the serialize-deserialize pair in CommonJoinResolver --- Key: HIVE-4078 URL: https://issues.apache.org/jira/browse/HIVE-4078 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-4078-20130227.patch, HIVE-4078.patch CommonJoinProcessor tries to clone a MapredWork while attempting a conversion to a map-join {code} // deep copy a new mapred work from xml InputStream in = new ByteArrayInputStream(xml.getBytes(UTF-8)); MapredWork newWork = Utilities.deserializeMapRedWork(in, physicalContext.getConf()); {code} which is a very heavy operation memory wise cpu-wise. Instead of cloning via XMLEncoder, it is faster to use BeanUtils.cloneBean() which is following same data paths (get/set bean methods) instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3775) Unit test failures due to unspecified order of results in show grant command
[ https://issues.apache.org/jira/browse/HIVE-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588811#comment-13588811 ] Gunther Hagleitner commented on HIVE-3775: -- Updated: https://reviews.facebook.net/D8811 Unit test failures due to unspecified order of results in show grant command -- Key: HIVE-3775 URL: https://issues.apache.org/jira/browse/HIVE-3775 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-3775.1-r1417768.patch, HIVE-3775.2.patch A number of unit tests (sometimes) using show grant fail, when run on windows or previous failures put the database in an unexpected state. The reason is that the output of show grant is not specified to be in any particular order, but the golden files expect it to be. The unit test framework should be extended to handled cases like that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4086) Cross Database Support for Indexes and Partitions (or all DDL statements)
[ https://issues.apache.org/jira/browse/HIVE-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588824#comment-13588824 ] Todd Wilson commented on HIVE-4086: --- Hello Jarcec: Your suggestion for the USE command works when querying hive directly, but I'm using a couple of ODBC drivers (MapR and HortonWorks) and it looks like this command doesn't work (which made me think this command wasn't working/supported on Hive :/). Anyways, this is an ODBC issue I think. Thanks again for your help. Best Regards, Todd Wilson Senior Technical Consultant Coffing Data Warehousing (513) 292-3158 www.CoffingDW.com The information contained in this communication is confidential, private, proprietary, or otherwise privileged and is intended only for the use of the addressee. Unauthorized use, disclosure, distribution or copying is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender immediately at gene...@coffingdw.com. * Cross Database Support for Indexes and Partitions (or all DDL statements) - Key: HIVE-4086 URL: https://issues.apache.org/jira/browse/HIVE-4086 Project: Hive Issue Type: Improvement Components: Database/Schema, ODBC, SQL Affects Versions: 0.9.0 Environment: Writing a query tool in .NET connecting with ODBC to Hadoop on Linux. Reporter: Todd Wilson I'd like to see more cross-database support. I'm using a Cloudera implementation on Hive .9. Currently, you can create new databases, you can create tables and views in those databases, but you cannot create indexes or partitions on those tables. Likewise, commands like show partitions or show indexes will only work on table in the default database. This would probably also affect statements like Alter Table and Recover Partitions. Probably also something like Create Function, but if you want to keep all functions being created in the default database that would work. I would be more interested in full cross-database support for tables and views to start. Functions for example could all be created in default. Thank you. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4034) Should be able to specify windowing spec without needing Between
[ https://issues.apache.org/jira/browse/HIVE-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4034: --- Assignee: Ashutosh Chauhan Should be able to specify windowing spec without needing Between Key: HIVE-4034 URL: https://issues.apache.org/jira/browse/HIVE-4034 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Currently user need to do following: {noformat} select s, sum(b) over (distribute by i sort by si rows between unbounded preceding and current row) from over100k; {noformat} but sql spec allows following as well: {noformat} select s, sum(b) over (distribute by i sort by si rows unbounded preceding) from over100k; {noformat} In such cases {{current row}} should be assumed implicitly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4086) Cross Database Support for Indexes and Partitions (or all DDL statements)
[ https://issues.apache.org/jira/browse/HIVE-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588849#comment-13588849 ] Jarek Jarcec Cecho commented on HIVE-4086: -- The {{USE db}} definitely works in hive shell and JDBC interface. I can't speak for the ODBC drivers unfortunately. Cross Database Support for Indexes and Partitions (or all DDL statements) - Key: HIVE-4086 URL: https://issues.apache.org/jira/browse/HIVE-4086 Project: Hive Issue Type: Improvement Components: Database/Schema, ODBC, SQL Affects Versions: 0.9.0 Environment: Writing a query tool in .NET connecting with ODBC to Hadoop on Linux. Reporter: Todd Wilson I'd like to see more cross-database support. I'm using a Cloudera implementation on Hive .9. Currently, you can create new databases, you can create tables and views in those databases, but you cannot create indexes or partitions on those tables. Likewise, commands like show partitions or show indexes will only work on table in the default database. This would probably also affect statements like Alter Table and Recover Partitions. Probably also something like Create Function, but if you want to keep all functions being created in the default database that would work. I would be more interested in full cross-database support for tables and views to start. Functions for example could all be created in default. Thank you. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4034) Should be able to specify windowing spec without needing Between
[ https://issues.apache.org/jira/browse/HIVE-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4034: --- Attachment: HIVE-4034.patch Patch which fixes grammar to handle these cases. I found a bug in range handling so most changes are related to that. See new +ve test cases related to range in window specification. Should be able to specify windowing spec without needing Between Key: HIVE-4034 URL: https://issues.apache.org/jira/browse/HIVE-4034 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-4034.patch Currently user need to do following: {noformat} select s, sum(b) over (distribute by i sort by si rows between unbounded preceding and current row) from over100k; {noformat} but sql spec allows following as well: {noformat} select s, sum(b) over (distribute by i sort by si rows unbounded preceding) from over100k; {noformat} In such cases {{current row}} should be assumed implicitly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #78
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/78/ -- [...truncated 41971 lines...] [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2013-02-27 14:51:02,720 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] Execution completed successfully [junit] Mapred Local Task Succeeded . Convert the Join into MapJoin [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/localscratchdir/hive_2013-02-27_14-50-59_570_5853974869671614312/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/tmp/hive_job_log_jenkins_201302271451_661347323.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Copying file: file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/data/files/kv1.txt [junit] PREHOOK: query: load data local inpath '/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] Table default.testhivedrivertable stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 5812, raw_data_size: 0] [junit] POSTHOOK: query: load data local inpath '/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/localscratchdir/hive_2013-02-27_14-51-04_036_7445463875010800454/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/localscratchdir/hive_2013-02-27_14-51-04_036_7445463875010800454/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/build/service/tmp/hive_job_log_jenkins_201302271451_879632411.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable
[jira] [Updated] (HIVE-4078) Remove the serialize-deserialize pair in CommonJoinResolver
[ https://issues.apache.org/jira/browse/HIVE-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-4078: -- Status: Open (was: Patch Available) cloneBean() only clones part of the data, does not do a true deep-copy. Remove the serialize-deserialize pair in CommonJoinResolver --- Key: HIVE-4078 URL: https://issues.apache.org/jira/browse/HIVE-4078 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-4078-20130227.patch, HIVE-4078.patch CommonJoinProcessor tries to clone a MapredWork while attempting a conversion to a map-join {code} // deep copy a new mapred work from xml InputStream in = new ByteArrayInputStream(xml.getBytes(UTF-8)); MapredWork newWork = Utilities.deserializeMapRedWork(in, physicalContext.getConf()); {code} which is a very heavy operation memory wise cpu-wise. Instead of cloning via XMLEncoder, it is faster to use BeanUtils.cloneBean() which is following same data paths (get/set bean methods) instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4044) Add URL type
[ https://issues.apache.org/jira/browse/HIVE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1357#comment-1357 ] Samuel Yuan commented on HIVE-4044: --- You're right, the idea is that it will enable better encoding of URLs. Kevin found that breaking up the URL into its components and storing them as separate columns results in significant space savings. The original plan was to implement this idea with RCFile, but with the new ORC file format I decided to wait for that instead, and to submit this part separately. However, it looks like the improvements of the ORC file have erased any gains we would have gotten by breaking up URLs into the individual components, so this won't be needed any more. Add URL type Key: HIVE-4044 URL: https://issues.apache.org/jira/browse/HIVE-4044 Project: Hive Issue Type: Improvement Reporter: Samuel Yuan Assignee: Samuel Yuan Attachments: HIVE-4044.HIVE-4044.HIVE-4044.D8799.1.patch Having a separate type for URLs would enable improvements in storage efficiency based on breaking up a URL into its components. The new type will be named URL and made a non-reserved keyword (see HIVE-701). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4090) Use of hive.exec.script.allow.partial.consumption can produce partial results
Kevin Wilfong created HIVE-4090: --- Summary: Use of hive.exec.script.allow.partial.consumption can produce partial results Key: HIVE-4090 URL: https://issues.apache.org/jira/browse/HIVE-4090 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Kevin Wilfong When users execute use a transform script with the config hive.exec.script.allow.partial.consumption set to true, it may produce partial results. When this config is set the script may close it's input pipe before its parent operator has finished passing it rows. In the catch block for this exception, the setDone method is called marking the operator as done. However, there's a separate thread running to process rows passed from the script back to Hive via stdout. If this thread is not done processing rows, any rows it forwards after the setDone method is called will not be passed to its children. This leads to partial results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4090) Use of hive.exec.script.allow.partial.consumption can produce partial results
[ https://issues.apache.org/jira/browse/HIVE-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588891#comment-13588891 ] Kevin Wilfong commented on HIVE-4090: - https://reviews.facebook.net/D8979 Use of hive.exec.script.allow.partial.consumption can produce partial results - Key: HIVE-4090 URL: https://issues.apache.org/jira/browse/HIVE-4090 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Kevin Wilfong When users execute use a transform script with the config hive.exec.script.allow.partial.consumption set to true, it may produce partial results. When this config is set the script may close it's input pipe before its parent operator has finished passing it rows. In the catch block for this exception, the setDone method is called marking the operator as done. However, there's a separate thread running to process rows passed from the script back to Hive via stdout. If this thread is not done processing rows, any rows it forwards after the setDone method is called will not be passed to its children. This leads to partial results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4090) Use of hive.exec.script.allow.partial.consumption can produce partial results
[ https://issues.apache.org/jira/browse/HIVE-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-4090: Attachment: HIVE-4090.1.patch.txt Use of hive.exec.script.allow.partial.consumption can produce partial results - Key: HIVE-4090 URL: https://issues.apache.org/jira/browse/HIVE-4090 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Kevin Wilfong Attachments: HIVE-4090.1.patch.txt When users execute use a transform script with the config hive.exec.script.allow.partial.consumption set to true, it may produce partial results. When this config is set the script may close it's input pipe before its parent operator has finished passing it rows. In the catch block for this exception, the setDone method is called marking the operator as done. However, there's a separate thread running to process rows passed from the script back to Hive via stdout. If this thread is not done processing rows, any rows it forwards after the setDone method is called will not be passed to its children. This leads to partial results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4044) Add URL type
[ https://issues.apache.org/jira/browse/HIVE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4044: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) Per [~sxyuan] this is not needed anymore. Resolving. Add URL type Key: HIVE-4044 URL: https://issues.apache.org/jira/browse/HIVE-4044 Project: Hive Issue Type: Improvement Reporter: Samuel Yuan Assignee: Samuel Yuan Attachments: HIVE-4044.HIVE-4044.HIVE-4044.D8799.1.patch Having a separate type for URLs would enable improvements in storage efficiency based on breaking up a URL into its components. The new type will be named URL and made a non-reserved keyword (see HIVE-701). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4080: -- Attachment: HIVE-4080.D8961.2.patch hbutani updated the revision HIVE-4080 [jira] Add Lead Lag UDAFs. - add Lead and Lag UDAFs, fix issues specified in review Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D8961 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8961?vs=28749id=28803#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLead.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLeadLag.java ql/src/test/queries/clientpositive/leadlag_queries.q ql/src/test/results/clientpositive/leadlag_queries.q.out To: JIRA, ashutoshc, hbutani Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch, HIVE-4080.D8961.2.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588986#comment-13588986 ] Kevin Wilfong commented on HIVE-3874: - K, let me know when it's ready for review again. Create a new Optimized Row Columnar file format for Hive Key: HIVE-3874 URL: https://issues.apache.org/jira/browse/HIVE-3874 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, HIVE-3874.D8529.2.patch, HIVE-3874.D8529.3.patch, HIVE-3874.D8529.4.patch, HIVE-3874.D8871.1.patch, OrcFileIntro.pptx, orc.tgz There are several limitations of the current RC File format that I'd like to address by creating a new format: * each column value is stored as a binary blob, which means: ** the entire column value must be read, decompressed, and deserialized ** the file format can't use smarter type-specific compression ** push down filters can't be evaluated * the start of each row group needs to be found by scanning * user metadata can only be added to the file when the file is created * the file doesn't store the number of rows per a file or row group * there is no mechanism for seeking to a particular row number, which is required for external indexes. * there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive
[ https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-3874: Status: Open (was: Patch Available) Create a new Optimized Row Columnar file format for Hive Key: HIVE-3874 URL: https://issues.apache.org/jira/browse/HIVE-3874 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: hive.3874.2.patch, HIVE-3874.D8529.1.patch, HIVE-3874.D8529.2.patch, HIVE-3874.D8529.3.patch, HIVE-3874.D8529.4.patch, HIVE-3874.D8871.1.patch, OrcFileIntro.pptx, orc.tgz There are several limitations of the current RC File format that I'd like to address by creating a new format: * each column value is stored as a binary blob, which means: ** the entire column value must be read, decompressed, and deserialized ** the file format can't use smarter type-specific compression ** push down filters can't be evaluated * the start of each row group needs to be found by scanning * user metadata can only be added to the file when the file is created * the file doesn't store the number of rows per a file or row group * there is no mechanism for seeking to a particular row number, which is required for external indexes. * there is no mechanism for storing light weight indexes within the file to enable push-down filters to skip entire row groups. * the type of the rows aren't stored in the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3849) Aliased column in where clause for multi-groupby single reducer cannot be resolved
[ https://issues.apache.org/jira/browse/HIVE-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3849: Status: Patch Available (was: Open) Aliased column in where clause for multi-groupby single reducer cannot be resolved -- Key: HIVE-3849 URL: https://issues.apache.org/jira/browse/HIVE-3849 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3849.D7713.1.patch, HIVE-3849.D7713.2.patch, HIVE-3849.D7713.3.patch, HIVE-3849.D7713.4.patch, HIVE-3849.D7713.5.patch, HIVE-3849.D7713.6.patch, HIVE-3849.D7713.7.patch Verifying HIVE-3847, I've found an exception is thrown before meeting the error situation described in it. Something like, FAILED: SemanticException [Error 10025]: Line 40:6 Expression not in GROUP BY key 'crit5' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3849) Aliased column in where clause for multi-groupby single reducer cannot be resolved
[ https://issues.apache.org/jira/browse/HIVE-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589009#comment-13589009 ] Phabricator commented on HIVE-3849: --- navis has commented on the revision HIVE-3849 [jira] Columns are not extracted for multi-groupby single reducer case somtimes. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:3537 done. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:3548 ok. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:3569 done. ql/src/test/queries/clientpositive/groupby_multi_single_reducer3.q:8 ah, got it. REVISION DETAIL https://reviews.facebook.net/D7713 To: JIRA, navis Cc: njain Aliased column in where clause for multi-groupby single reducer cannot be resolved -- Key: HIVE-3849 URL: https://issues.apache.org/jira/browse/HIVE-3849 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3849.D7713.1.patch, HIVE-3849.D7713.2.patch, HIVE-3849.D7713.3.patch, HIVE-3849.D7713.4.patch, HIVE-3849.D7713.5.patch, HIVE-3849.D7713.6.patch, HIVE-3849.D7713.7.patch, HIVE-3849.D7713.8.patch Verifying HIVE-3847, I've found an exception is thrown before meeting the error situation described in it. Something like, FAILED: SemanticException [Error 10025]: Line 40:6 Expression not in GROUP BY key 'crit5' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3849) Aliased column in where clause for multi-groupby single reducer cannot be resolved
[ https://issues.apache.org/jira/browse/HIVE-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3849: -- Attachment: HIVE-3849.D7713.8.patch navis updated the revision HIVE-3849 [jira] Columns are not extracted for multi-groupby single reducer case somtimes. Addressed comments and rebased to trunk Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D7713 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D7713?vs=27243id=28815#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java ql/src/test/queries/clientpositive/groupby_multi_insert_common_distinct.q ql/src/test/queries/clientpositive/groupby_mutli_insert_common_distinct.q ql/src/test/queries/clientpositive/groupby_multi_single_reducer3.q ql/src/test/results/clientpositive/groupby_multi_insert_common_distinct.q.out ql/src/test/results/clientpositive/groupby_mutli_insert_common_distinct.q.out ql/src/test/results/clientpositive/groupby_multi_single_reducer3.q.out To: JIRA, navis Cc: njain Aliased column in where clause for multi-groupby single reducer cannot be resolved -- Key: HIVE-3849 URL: https://issues.apache.org/jira/browse/HIVE-3849 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3849.D7713.1.patch, HIVE-3849.D7713.2.patch, HIVE-3849.D7713.3.patch, HIVE-3849.D7713.4.patch, HIVE-3849.D7713.5.patch, HIVE-3849.D7713.6.patch, HIVE-3849.D7713.7.patch, HIVE-3849.D7713.8.patch Verifying HIVE-3847, I've found an exception is thrown before meeting the error situation described in it. Something like, FAILED: SemanticException [Error 10025]: Line 40:6 Expression not in GROUP BY key 'crit5' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3904) Replace hashmaps in JoinOperators to array
[ https://issues.apache.org/jira/browse/HIVE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3904: Status: Patch Available (was: Open) Replace hashmaps in JoinOperators to array -- Key: HIVE-3904 URL: https://issues.apache.org/jira/browse/HIVE-3904 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3904.D7959.1.patch Join operator has many HashMaps that maps tag to some internal value(ExprEvals, OIs, etc.) and theses are accessed 5 or more times per an object, which is seemed unnecessary overhead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3904) Replace hashmaps in JoinOperators to array
[ https://issues.apache.org/jira/browse/HIVE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3904: -- Attachment: HIVE-3904.D7959.2.patch navis updated the revision HIVE-3904 [jira] Replace hashmaps in JoinOperators to array. Rebased to trunk Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D7959 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D7959?vs=25563id=28821#toc AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java To: JIRA, navis Cc: njain Replace hashmaps in JoinOperators to array -- Key: HIVE-3904 URL: https://issues.apache.org/jira/browse/HIVE-3904 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3904.D7959.1.patch, HIVE-3904.D7959.2.patch Join operator has many HashMaps that maps tag to some internal value(ExprEvals, OIs, etc.) and theses are accessed 5 or more times per an object, which is seemed unnecessary overhead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4091) [REGRESSION] HIVE-3571 does not run all tests sometimes
Navis created HIVE-4091: --- Summary: [REGRESSION] HIVE-3571 does not run all tests sometimes Key: HIVE-4091 URL: https://issues.apache.org/jira/browse/HIVE-4091 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Navis Assignee: Navis ant test does not run whole test but only runs tests in ql sometimes (the time difference is about 30 min) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2264) Hive server is SHUTTING DOWN when invalid queries beeing executed.
[ https://issues.apache.org/jira/browse/HIVE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589058#comment-13589058 ] Navis commented on HIVE-2264: - I'm applying this for all internal hive releases and wish to be reviewed/applied into apache hive. But sadly, no committer seemed interested in it. It's already in patch-available status. Do you have more idea to be merged with this? Then I'll happily assign it to you. Hive server is SHUTTING DOWN when invalid queries beeing executed. -- Key: HIVE-2264 URL: https://issues.apache.org/jira/browse/HIVE-2264 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0 Environment: SuSE-Linux-11 Reporter: rohithsharma Assignee: Navis Priority: Critical Attachments: HIVE-2264.1.patch.txt, HIVE-2264-2.patch When invalid query is beeing executed, Hive server is shutting down. {noformat} CREATE TABLE SAMPLETABLE(IP STRING , showtime BIGINT ) partitioned by (ds string,ipz int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040' ALTER TABLE SAMPLETABLE add Partition(ds='sf') location '/user/hive/warehouse' Partition(ipz=100) location '/user/hive/warehouse' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2264) Hive server is SHUTTING DOWN when invalid queries beeing executed.
[ https://issues.apache.org/jira/browse/HIVE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589065#comment-13589065 ] Brock Noland commented on HIVE-2264: [~navis] You have been running with this patch for quite a long time? In regards to getting it merged, I think the best we can do is update the review board item with the rebased patch. Another item that may bring it up in terms of visibility is linking it to HIVE-2935 as it's quite important for HS2. Hive server is SHUTTING DOWN when invalid queries beeing executed. -- Key: HIVE-2264 URL: https://issues.apache.org/jira/browse/HIVE-2264 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0 Environment: SuSE-Linux-11 Reporter: rohithsharma Assignee: Navis Priority: Critical Attachments: HIVE-2264.1.patch.txt, HIVE-2264-2.patch When invalid query is beeing executed, Hive server is shutting down. {noformat} CREATE TABLE SAMPLETABLE(IP STRING , showtime BIGINT ) partitioned by (ds string,ipz int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040' ALTER TABLE SAMPLETABLE add Partition(ds='sf') location '/user/hive/warehouse' Partition(ipz=100) location '/user/hive/warehouse' {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1990 - Failure
Changes for Build #1990 1 tests failed. REGRESSION: org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1 Error Message: Unexpected exception See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get more logs. Stack Trace: junit.framework.AssertionFailedError: Unexpected exception See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get more logs. at junit.framework.Assert.fail(Assert.java:47) at org.apache.hadoop.hive.cli.TestNegativeCliDriver.runTest(TestNegativeCliDriver.java:2381) at org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1(TestNegativeCliDriver.java:1867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906) The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1990) Status: Failure Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1990/ to view the results.
[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589076#comment-13589076 ] Phabricator commented on HIVE-4080: --- ashutoshc has requested changes to the revision HIVE-4080 [jira] Add Lead Lag UDAFs. Mostly looks good. * Missing apache headers. * Request for couple more tests. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLead.java:1 Apache headers are missing. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLeadLag.java:1 Apache headers missing. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java:1 Apache headers missing. ql/src/test/queries/clientpositive/leadlag_queries.q:19 So, lead/lag can now take three params. First one is column, second one is offset, third is default value. Correct? It will be good to have a test case with a constant default like lag (price,5,10) and one with implicit params like lag (price) which implies default of offset = 1 and NULL for default. REVISION DETAIL https://reviews.facebook.net/D8961 BRANCH HIVE-4080 ARCANIST PROJECT hive To: JIRA, ashutoshc, hbutani Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch, HIVE-4080.D8961.2.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4091) [REGRESSION] HIVE-3571 does not run all tests sometimes
[ https://issues.apache.org/jira/browse/HIVE-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589079#comment-13589079 ] Ashutosh Chauhan commented on HIVE-4091: One which I noticed is test in shims/ dir are not run. Are there others as well? [REGRESSION] HIVE-3571 does not run all tests sometimes --- Key: HIVE-4091 URL: https://issues.apache.org/jira/browse/HIVE-4091 Project: Hive Issue Type: Bug Components: Testing Infrastructure Reporter: Navis Assignee: Navis ant test does not run whole test but only runs tests in ql sometimes (the time difference is about 30 min) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4073) Make partition by optional in over clause
[ https://issues.apache.org/jira/browse/HIVE-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589129#comment-13589129 ] Brock Noland commented on HIVE-4073: [~ashutoshc] When OVER() with no partition column is specified, we will partition by some constant. In that case, there is only a need for a single reducer. Should this change force the single reducer? Not sure how how I would do that yet. Make partition by optional in over clause - Key: HIVE-4073 URL: https://issues.apache.org/jira/browse/HIVE-4073 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Ashutosh Chauhan Assignee: Brock Noland select s, sum( i ) over() from tt; should work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3785) Core hive changes for HiveServer2 implementation
[ https://issues.apache.org/jira/browse/HIVE-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589135#comment-13589135 ] Thejas M Nair commented on HIVE-3785: - Hi [~namit] Rebased patch has been uploaded by Prasad in HIVE-2935. Can you please review it using the phabricator link there ? The phabricator upload has only files that have changed. Core hive changes for HiveServer2 implementation Key: HIVE-3785 URL: https://issues.apache.org/jira/browse/HIVE-3785 Project: Hive Issue Type: Sub-task Components: Authentication, Build Infrastructure, Configuration, Thrift API Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HS2-changed-files-only.patch The subtask to track changes in the core hive components for HiveServer2 implementation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype
[ https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anandha L Ranganathan updated HIVE-3850: Attachment: (was: hive-3850.patch) hour() function returns 12 hour clock value when using timestamp datatype - Key: HIVE-3850 URL: https://issues.apache.org/jira/browse/HIVE-3850 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.9.0, 0.10.0 Reporter: Pieterjan Vriends Fix For: 0.11.0 Attachments: HIVE-3850.patch.txt Apparently UDFHour.java does have two evaluate() functions. One that does accept a Text object as parameter and one that does use a TimeStampWritable object as parameter. The first function does return the value of Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the documentation I couldn't find any information on the overload of the evaluation function. I did spent quite some time finding out why my statement didn't return a 24 hour clock value. Shouldn't both functions return the same? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype
[ https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anandha L Ranganathan updated HIVE-3850: Attachment: hive-3850.patch re-attaching the patch hour() function returns 12 hour clock value when using timestamp datatype - Key: HIVE-3850 URL: https://issues.apache.org/jira/browse/HIVE-3850 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.9.0, 0.10.0 Reporter: Pieterjan Vriends Fix For: 0.11.0 Attachments: hive-3850.patch, HIVE-3850.patch.txt Apparently UDFHour.java does have two evaluate() functions. One that does accept a Text object as parameter and one that does use a TimeStampWritable object as parameter. The first function does return the value of Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the documentation I couldn't find any information on the overload of the evaluation function. I did spent quite some time finding out why my statement didn't return a 24 hour clock value. Shouldn't both functions return the same? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype
[ https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anandha L Ranganathan updated HIVE-3850: Attachment: (was: hive-3850.patch) hour() function returns 12 hour clock value when using timestamp datatype - Key: HIVE-3850 URL: https://issues.apache.org/jira/browse/HIVE-3850 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.9.0, 0.10.0 Reporter: Pieterjan Vriends Fix For: 0.11.0 Attachments: hive-3850.patch_1.txt, HIVE-3850.patch.txt Apparently UDFHour.java does have two evaluate() functions. One that does accept a Text object as parameter and one that does use a TimeStampWritable object as parameter. The first function does return the value of Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the documentation I couldn't find any information on the overload of the evaluation function. I did spent quite some time finding out why my statement didn't return a 24 hour clock value. Shouldn't both functions return the same? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype
[ https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anandha L Ranganathan updated HIVE-3850: Attachment: hive-3850.patch_1.txt hour() function returns 12 hour clock value when using timestamp datatype - Key: HIVE-3850 URL: https://issues.apache.org/jira/browse/HIVE-3850 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.9.0, 0.10.0 Reporter: Pieterjan Vriends Fix For: 0.11.0 Attachments: hive-3850.patch_1.txt, HIVE-3850.patch.txt Apparently UDFHour.java does have two evaluate() functions. One that does accept a Text object as parameter and one that does use a TimeStampWritable object as parameter. The first function does return the value of Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the documentation I couldn't find any information on the overload of the evaluation function. I did spent quite some time finding out why my statement didn't return a 24 hour clock value. Shouldn't both functions return the same? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4073) Make partition by optional in over clause
[ https://issues.apache.org/jira/browse/HIVE-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4073: --- Attachment: HIVE-4073-0.patch Attached patch not for commit, but it does work. I'll add some tests tomorrow and then think about forcing one reducer. When I manually set more than one reducer the patch still worked since the partition key was the same for a records. Make partition by optional in over clause - Key: HIVE-4073 URL: https://issues.apache.org/jira/browse/HIVE-4073 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Ashutosh Chauhan Assignee: Brock Noland Attachments: HIVE-4073-0.patch select s, sum( i ) over() from tt; should work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype
[ https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anandha L Ranganathan updated HIVE-3850: Attachment: (was: hive-3850.patch_1.txt) hour() function returns 12 hour clock value when using timestamp datatype - Key: HIVE-3850 URL: https://issues.apache.org/jira/browse/HIVE-3850 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.9.0, 0.10.0 Reporter: Pieterjan Vriends Fix For: 0.11.0 Attachments: hive-3850_1.patch, HIVE-3850.patch.txt Apparently UDFHour.java does have two evaluate() functions. One that does accept a Text object as parameter and one that does use a TimeStampWritable object as parameter. The first function does return the value of Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the documentation I couldn't find any information on the overload of the evaluation function. I did spent quite some time finding out why my statement didn't return a 24 hour clock value. Shouldn't both functions return the same? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: Request to review the change
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9673/ --- Review request for hive. Description --- Patch for issue https://issues.apache.org/jira/browse/HIVE-3850. Please review. This addresses bug https://issues.apache.org/jira/browse/HIVE-3850. https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/HIVE-3850 Diffs - /trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFHour.java 115 /trunk/ql/src/test/queries/clientpositive/udf_hour.q 115 /trunk/ql/src/test/results/clientpositive/udf_hour.q.out 115 Diff: https://reviews.apache.org/r/9673/diff/ Testing --- Attached test case with results. Includes .q and .q.out Thanks, Anandha L Ranganahan
[jira] [Updated] (HIVE-4090) Use of hive.exec.script.allow.partial.consumption can produce partial results
[ https://issues.apache.org/jira/browse/HIVE-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-4090: Assignee: Kevin Wilfong Use of hive.exec.script.allow.partial.consumption can produce partial results - Key: HIVE-4090 URL: https://issues.apache.org/jira/browse/HIVE-4090 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4090.1.patch.txt When users execute use a transform script with the config hive.exec.script.allow.partial.consumption set to true, it may produce partial results. When this config is set the script may close it's input pipe before its parent operator has finished passing it rows. In the catch block for this exception, the setDone method is called marking the operator as done. However, there's a separate thread running to process rows passed from the script back to Hive via stdout. If this thread is not done processing rows, any rows it forwards after the setDone method is called will not be passed to its children. This leads to partial results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4090) Use of hive.exec.script.allow.partial.consumption can produce partial results
[ https://issues.apache.org/jira/browse/HIVE-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wilfong updated HIVE-4090: Status: Patch Available (was: Open) Use of hive.exec.script.allow.partial.consumption can produce partial results - Key: HIVE-4090 URL: https://issues.apache.org/jira/browse/HIVE-4090 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Kevin Wilfong Attachments: HIVE-4090.1.patch.txt When users execute use a transform script with the config hive.exec.script.allow.partial.consumption set to true, it may produce partial results. When this config is set the script may close it's input pipe before its parent operator has finished passing it rows. In the catch block for this exception, the setDone method is called marking the operator as done. However, there's a separate thread running to process rows passed from the script back to Hive via stdout. If this thread is not done processing rows, any rows it forwards after the setDone method is called will not be passed to its children. This leads to partial results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4014) Hive+RCFile is not doing column pruning and reading much more data than necessary
[ https://issues.apache.org/jira/browse/HIVE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589226#comment-13589226 ] Lianhui Wang commented on HIVE-4014: hi,Tamas thank you very much,you are right. also i think rcfile.reader are not very efficient. the readed column ids are transfer to rcfile.reader. Hive+RCFile is not doing column pruning and reading much more data than necessary - Key: HIVE-4014 URL: https://issues.apache.org/jira/browse/HIVE-4014 Project: Hive Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli With even simple projection queries, I see that HDFS bytes read counter doesn't show any reduction in the amount of data read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3980) Cleanup after HIVE-3403
[ https://issues.apache.org/jira/browse/HIVE-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589241#comment-13589241 ] Namit Jain commented on HIVE-3980: -- [~ashutoshc], ping Cleanup after HIVE-3403 --- Key: HIVE-3980 URL: https://issues.apache.org/jira/browse/HIVE-3980 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3980.1.patch, hive.3980.2.patch There have been a lot of comments on HIVE-3403, which involve changing variable names/function names/adding more comments/general cleanup etc. Since HIVE-3403 involves a lot of refactoring, it was fairly difficult to address the comments there, since refreshing becomes impossible. This jira is to track those cleanups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3891) physical optimizer changes for auto sort-merge join
[ https://issues.apache.org/jira/browse/HIVE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3891: - Attachment: hive.3891.7.patch physical optimizer changes for auto sort-merge join --- Key: HIVE-3891 URL: https://issues.apache.org/jira/browse/HIVE-3891 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3891.1.patch, hive.3891.2.patch, hive.3891.3.patch, hive.3891.4.patch, hive.3891.5.patch, hive.3891.6.patch, hive.3891.7.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4007) Create abstract classes for serializer and deserializer
[ https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4007: - Attachment: hive.4007.4.patch Create abstract classes for serializer and deserializer --- Key: HIVE-4007 URL: https://issues.apache.org/jira/browse/HIVE-4007 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.4007.1.patch, hive.4007.2.patch, hive.4007.3.patch, hive.4007.4.patch Currently, it is very difficult to change the Serializer/Deserializer interface, since all the SerDes directly implement the interface. Instead, we should have abstract classes for implementing these interfaces. In case of a interface change, only the abstract class and the relevant serde needs to change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4042) ignore mapjoin hint
[ https://issues.apache.org/jira/browse/HIVE-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4042: - Attachment: hive.4042.7.patch ignore mapjoin hint --- Key: HIVE-4042 URL: https://issues.apache.org/jira/browse/HIVE-4042 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.4042.1.patch, hive.4042.2.patch, hive.4042.3.patch, hive.4042.4.patch, hive.4042.5.patch, hive.4042.6.patch, hive.4042.7.patch After HIVE-3784, in a production environment, it can become difficult to deploy since a lot of production queries can break. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3891) physical optimizer changes for auto sort-merge join
[ https://issues.apache.org/jira/browse/HIVE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589253#comment-13589253 ] Namit Jain commented on HIVE-3891: -- [~vikram.dixit], I am confused. Look at line 965 of auto_sortmerge_join_1.q.out, it is a SMB. Going in more detail: (line 486-492) Stage-5 is a root stage , consists of Stage-6, Stage-7, Stage-1 Stage-6 has a backup stage: Stage-1 Stage-3 depends on stages: Stage-6 Stage-7 has a backup stage: Stage-1 Stage-4 depends on stages: Stage-7 Stage-1 Stage-0 is a root stage Stage6 and 7 are mapjoin jobs, whereas Stage1 is a SMB join. This is the purpose of this jira. If a mapjoin can be performed, that gets priority over SMB join. physical optimizer changes for auto sort-merge join --- Key: HIVE-3891 URL: https://issues.apache.org/jira/browse/HIVE-3891 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3891.1.patch, hive.3891.2.patch, hive.3891.3.patch, hive.3891.4.patch, hive.3891.5.patch, hive.3891.6.patch, hive.3891.7.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4090) Use of hive.exec.script.allow.partial.consumption can produce partial results
[ https://issues.apache.org/jira/browse/HIVE-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589254#comment-13589254 ] Namit Jain commented on HIVE-4090: -- +1 Use of hive.exec.script.allow.partial.consumption can produce partial results - Key: HIVE-4090 URL: https://issues.apache.org/jira/browse/HIVE-4090 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4090.1.patch.txt When users execute use a transform script with the config hive.exec.script.allow.partial.consumption set to true, it may produce partial results. When this config is set the script may close it's input pipe before its parent operator has finished passing it rows. In the catch block for this exception, the setDone method is called marking the operator as done. However, there's a separate thread running to process rows passed from the script back to Hive via stdout. If this thread is not done processing rows, any rows it forwards after the setDone method is called will not be passed to its children. This leads to partial results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3952) merge map-job followed by map-reduce job
[ https://issues.apache.org/jira/browse/HIVE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3952: - Status: Open (was: Patch Available) Can you create a phabricator entry ? merge map-job followed by map-reduce job Key: HIVE-3952 URL: https://issues.apache.org/jira/browse/HIVE-3952 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Vinod Kumar Vavilapalli Attachments: HIVE-3952-20130226.txt, HIVE-3952-20130227.1.txt Consider the query like: select count(*) FROM ( select idOne, idTwo, value FROM bigTable JOIN smallTableOne on (bigTable.idOne = smallTableOne.idOne) ) firstjoin JOIN smallTableTwo on (firstjoin.idTwo = smallTableTwo.idTwo); where smallTableOne and smallTableTwo are smaller than hive.auto.convert.join.noconditionaltask.size and hive.auto.convert.join.noconditionaltask is set to true. The joins are collapsed into mapjoins, and it leads to a map-only job (for the map-joins) followed by a map-reduce job (for the group by). Ideally, the map-only job should be merged with the following map-reduce job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-684) add UDF make_set
[ https://issues.apache.org/jira/browse/HIVE-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-684: Status: Open (was: Patch Available) Have you run all the tests ? Some test outputs need to be updated. Also, can you create a phabricator entry ? add UDF make_set Key: HIVE-684 URL: https://issues.apache.org/jira/browse/HIVE-684 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: PRETTY SITHARA Attachments: HIVE-684.1.patch.txt, HIVE-684.2.patch.txt, input.txt.txt, make_set.q, make_set.q.out add UDFmake_set look at http://dev.mysql.com/doc/refman/5.0/en/func-op-summary-ref.html for details -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4077) alterPartition and alterPartitions methods in ObjectStore swallow exceptions
[ https://issues.apache.org/jira/browse/HIVE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4077: - Status: Open (was: Patch Available) comments alterPartition and alterPartitions methods in ObjectStore swallow exceptions Key: HIVE-4077 URL: https://issues.apache.org/jira/browse/HIVE-4077 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.11.0 Reporter: Kevin Wilfong Assignee: Kevin Wilfong Attachments: HIVE-4077.1.patch.txt, HIVE-4077.2.patch.txt The alterPartition and alterPartitions methods in the ObjectStore class throw a MetaException in the case of a failure but do not include the cause, meaning that information is lost. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4056) Extend rcfilecat to support (un)compressed size and no. of row
[ https://issues.apache.org/jira/browse/HIVE-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-4056: - Resolution: Fixed Fix Version/s: 0.11.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed. Thanks Tim Extend rcfilecat to support (un)compressed size and no. of row -- Key: HIVE-4056 URL: https://issues.apache.org/jira/browse/HIVE-4056 Project: Hive Issue Type: Bug Components: Statistics Reporter: Gang Tim Liu Assignee: Gang Tim Liu Fix For: 0.11.0 Attachments: HIVE-4056.patch.1 rcfilecat supports data and metadata: https://cwiki.apache.org/Hive/rcfilecat.html In metadata, it supports column statistics. It will be natural to extend metadata support to 1. no. of rows 2. uncompressed size for the file 3. compressed size for the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4007) Create abstract classes for serializer and deserializer
[ https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589316#comment-13589316 ] Namit Jain commented on HIVE-4007: -- Refreshed, tests passed Create abstract classes for serializer and deserializer --- Key: HIVE-4007 URL: https://issues.apache.org/jira/browse/HIVE-4007 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.4007.1.patch, hive.4007.2.patch, hive.4007.3.patch, hive.4007.4.patch Currently, it is very difficult to change the Serializer/Deserializer interface, since all the SerDes directly implement the interface. Instead, we should have abstract classes for implementing these interfaces. In case of a interface change, only the abstract class and the relevant serde needs to change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira