[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589612#comment-13589612 ] Phabricator commented on HIVE-4080: --- ashutoshc has accepted the revision HIVE-4080 [jira] Add Lead Lag UDAFs. +1 REVISION DETAIL https://reviews.facebook.net/D8961 BRANCH HIVE-4080 ARCANIST PROJECT hive To: JIRA, ashutoshc, hbutani Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch, HIVE-4080.D8961.2.patch, HIVE-4080.D8961.3.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588116#comment-13588116 ] Ashutosh Chauhan commented on HIVE-4080: bq. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Do you mean feature which this patch is introducing (ability to have lead function independent of UDAFs in select expr) will be removed? Consider following query: {noformat} select p_mfgr, p_retailprice, lead(p_retailprice,1) as l1 over (partition by p_mfgr order by p_name), lead(p_retailprice,1, p_retailprice) as l2 over (partition by p_size order by p_name), p_retailprice - lead(p_retailprice,1) from part; {noformat} My guess is ambiguity you are referring to is once we start supporting different partitioning in same query than last lead() in above query becomes ambiguous as to which partitioning function it is refering to. But my understanding is sql standard says that lead and lag function must always be associated with over clause. So, above query is illegal in standard sql. It must be written as: {noformat} select p_mfgr, p_retailprice, lead(p_retailprice,1) as l1 over (partition by p_mfgr order by p_name), lead(p_retailprice,1, p_retailprice) as l2 over (partition by p_size order by p_name), p_retailprice - lead(p_retailprice,1)as l3 over (partition by p_size order by p_name) from part; {noformat} Now we have this concept of default partitioning which would have made first query legal if partitioning scheme was identical for l1 and l2. I think long term: * We should keep functionality introduced in this patch to stay compliant. * Associate default partitioning with windowing function only if there is no ambiguity (i.e., there is only one partitioning clause in query). * Raise error if user doesn't specify partitioning and there are more than one partitioning scheme to choose from. Same argument stands for when lead/lag functions are used as arguments with UDAFs. Make sense ? Further, I think this concept of default partitioning is only extra convenience we are offering to hive users which is non-standard. If it turns out its burdensome to support this I am fine with removing it and always requiring user to specify over clause. Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588515#comment-13588515 ] Phabricator commented on HIVE-4080: --- ashutoshc has requested changes to the revision HIVE-4080 [jira] Add Lead Lag UDAFs. Some comments. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:533 Use LEAD_FUNC_NAME and LAG_FUNC_NAME here to be consistent. ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:871 This null check should be done outside of nesting if() block. ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:875 I don't see a case where fInfo is != null but udafResolver could be. If so, than this null check is redundant and should be removed. ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:890 We are already know that we are dealing with lead/lag udaf, it must be of type GenericUDAFResolver2.. no ? ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:866 I am wondering if this function can be rewritten as following: WindowFunctionInfo finfo = windowFunctions.get(name.toLowerCase()); if (finfo == null) { return null;} f ( !name.toLowerCase().equals(LEAD_FUNC_NAME) !name.toLowerCase().equals(LAG_FUNC_NAME) ) { return getGenericUDAFEvaluator(name, argumentOIs, isDistinct, isAllColumns); } // this must be lead/lag UDAF GenericUDAFResolver udafResolver = finfo.getfInfo().getGenericUDAFResolver(); GenericUDAFParameterInfo paramInfo = new SimpleGenericUDAFParameterInfo( argumentOIs.toArray(), isDistinct, isAllColumns); return ((GenericUDAFResolver2) udafResolver).getEvaluator(paramInfo); If not, than I have specific questions. See next few comments. ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:1467 It will be good to comments for this new boolean registerAsUDAF. Something like following There are certain UDAFs like lead/lag which we want as windowing functions, but don't want them to appear in mFunctions. Why? Because ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java:29 This and GenericUDAFLead share lots of common code. It might be good to have an abstract class for these two, just the way you have it in GenericUDFLeadLag. ql/src/test/queries/clientpositive/leadlag_queries.q:20 I think currently we don't support over clause on expressions. Once we do, it will be good to add test like: select p_retailprice - lead (p_retail) over (partition by p_mfgr) from part; ql/src/test/queries/clientpositive/leadlag_queries.q:35 It will be good to add a test which has both lead and lag in same query. REVISION DETAIL https://reviews.facebook.net/D8961 BRANCH HIVE-4080 ARCANIST PROJECT hive To: JIRA, ashutoshc, hbutani Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588560#comment-13588560 ] Harish Butani commented on HIVE-4080: - Yes, there are several related issues: 1. Lead/Lag as UDAFs - this jira only addresses this - will work on your comments. 2. Support expressions with over clause - filed JIRA 4081 for this - will work on this next 3. Support for lead/lag UDFs. Based on our offline conversation and as you point out here the options are: - should we continue to support - should we completely remove support? - support lead/lag as UDFs, but only within argument expressions of other UDAFs. The consensus seems to be option 3 is nice to have; 1 is problematic. Will address this in a separate JIRA 4. The notion of default partitions - you have given more proof, why supporting lead/lag as UDFs generally (option 1) is problematic. - in general, should we continue to support this? - Your approach, makes sense Will address this in separate JIRA. Does this break down of issues make sense? Will address the first 3 asap; and then work on supporting multiple partitions(4041). The 4th one will have to wait a bit. Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588612#comment-13588612 ] Phabricator commented on HIVE-4080: --- hbutani has commented on the revision HIVE-4080 [jira] Add Lead Lag UDAFs. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:866 Yes, i wanted to introduce a new fn to take a GenericUDAFResolver: public static GenericUDAFEvaluator getGenericUDAFEvaluator(GenericUDAFResolver, ListObjectInspector argumentOIs, boolean isDistinct, boolean isAllColumns) and have the current getGenericUDAFEvaluator and getGenericWindowingEvaluator call it. But backed out, because was not comfortable making this change and submitting the patch w/o running the entire test suite. Ended up just doing a cut and paste. Your soln is much better ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:1467 Yes, meant to do this. Somehow forgot, sorry ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java:29 yes, i was rushing this... should refactor it. ql/src/test/queries/clientpositive/leadlag_queries.q:20 yes exactly ql/src/test/queries/clientpositive/leadlag_queries.q:35 will add REVISION DETAIL https://reviews.facebook.net/D8961 BRANCH HIVE-4080 ARCANIST PROJECT hive To: JIRA, ashutoshc, hbutani Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589076#comment-13589076 ] Phabricator commented on HIVE-4080: --- ashutoshc has requested changes to the revision HIVE-4080 [jira] Add Lead Lag UDAFs. Mostly looks good. * Missing apache headers. * Request for couple more tests. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLead.java:1 Apache headers are missing. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLeadLag.java:1 Apache headers missing. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java:1 Apache headers missing. ql/src/test/queries/clientpositive/leadlag_queries.q:19 So, lead/lag can now take three params. First one is column, second one is offset, third is default value. Correct? It will be good to have a test case with a constant default like lag (price,5,10) and one with implicit params like lag (price) which implies default of offset = 1 and NULL for default. REVISION DETAIL https://reviews.facebook.net/D8961 BRANCH HIVE-4080 ARCANIST PROJECT hive To: JIRA, ashutoshc, hbutani Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch, HIVE-4080.D8961.2.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs
[ https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588041#comment-13588041 ] Harish Butani commented on HIVE-4080: - patch attached Add Lead Lag UDAFs Key: HIVE-4080 URL: https://issues.apache.org/jira/browse/HIVE-4080 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch Currently we support Lead/Lag as navigation UDFs usable with Windowing. To be standard compliant we need to support Lead Lag UDAFs. Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing is in play. Currently allow Lead/Lag expressions to appear in SelectLists even when they are not arguments to UDAFs. Support for this feature will probably be removed. Causes ambiguities when Query contains different partition clauses. Will provide more details with associated Jira to remove this feature. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira