[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs

2013-02-28 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589612#comment-13589612
 ] 

Phabricator commented on HIVE-4080:
---

ashutoshc has accepted the revision HIVE-4080 [jira] Add Lead  Lag UDAFs.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D8961

BRANCH
  HIVE-4080

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, hbutani


 Add Lead  Lag UDAFs
 

 Key: HIVE-4080
 URL: https://issues.apache.org/jira/browse/HIVE-4080
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch, 
 HIVE-4080.D8961.2.patch, HIVE-4080.D8961.3.patch


 Currently we support Lead/Lag as navigation UDFs usable with Windowing.
 To be standard compliant we need to support Lead  Lag UDAFs.
 Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing 
 is in play. 
 Currently allow Lead/Lag expressions to appear in SelectLists even when they 
 are not arguments to UDAFs. Support for this feature will probably be 
 removed. Causes ambiguities when Query contains different partition clauses. 
 Will provide more details with associated Jira to remove this feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs

2013-02-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588116#comment-13588116
 ] 

Ashutosh Chauhan commented on HIVE-4080:


bq.  Support for this feature will probably be removed. Causes ambiguities when 
Query contains different partition clauses. 
Do you mean feature which this patch is introducing (ability to have lead 
function independent of UDAFs in select expr) will be removed? Consider 
following query:
{noformat}
 select p_mfgr, p_retailprice,
lead(p_retailprice,1) as l1 over (partition by p_mfgr order by p_name),
lead(p_retailprice,1, p_retailprice) as l2 over (partition by p_size order by 
p_name),
p_retailprice - lead(p_retailprice,1) 
from part;
{noformat}
My guess is ambiguity you are referring to is once we start supporting 
different partitioning in same query than last lead() in above query becomes 
ambiguous as to which partitioning function it is refering to. But my 
understanding is sql standard says that lead and lag function must always be 
associated with over clause. So, above query is illegal in standard sql. It 
must be written as:
{noformat}
 select p_mfgr, p_retailprice,
lead(p_retailprice,1) as l1 over (partition by p_mfgr order by p_name),
lead(p_retailprice,1, p_retailprice) as l2 over (partition by p_size order by 
p_name),
p_retailprice - lead(p_retailprice,1)as l3 over (partition by p_size order by 
p_name) 
from part;
{noformat} 
Now we have this concept of default partitioning which would have made first 
query legal if partitioning scheme was identical for l1 and l2. I think long 
term:
* We should keep functionality introduced in this patch to stay compliant.
* Associate default partitioning with windowing function only if there is no 
ambiguity (i.e., there is only one partitioning clause in query).
* Raise error if user doesn't specify partitioning and there are more than one 
partitioning scheme to choose from.
Same argument stands for when lead/lag functions are used as arguments with 
UDAFs. Make sense ?

Further, I think this concept of default partitioning is only extra convenience 
we are offering to hive users which is non-standard. If it turns out its 
burdensome to support this I am fine with removing it and always requiring user 
to specify over clause.

 Add Lead  Lag UDAFs
 

 Key: HIVE-4080
 URL: https://issues.apache.org/jira/browse/HIVE-4080
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch


 Currently we support Lead/Lag as navigation UDFs usable with Windowing.
 To be standard compliant we need to support Lead  Lag UDAFs.
 Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing 
 is in play. 
 Currently allow Lead/Lag expressions to appear in SelectLists even when they 
 are not arguments to UDAFs. Support for this feature will probably be 
 removed. Causes ambiguities when Query contains different partition clauses. 
 Will provide more details with associated Jira to remove this feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs

2013-02-27 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588515#comment-13588515
 ] 

Phabricator commented on HIVE-4080:
---

ashutoshc has requested changes to the revision HIVE-4080 [jira] Add Lead  
Lag UDAFs.

  Some comments.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:533 Use 
LEAD_FUNC_NAME and LAG_FUNC_NAME here to be consistent.
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:871 This 
null check should be done outside of nesting if() block.
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:875 I don't 
see a case where fInfo is != null but udafResolver could be. If so, than this 
null check is redundant and should be removed.
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:890 We are 
already know that we are dealing with lead/lag udaf, it must be of type 
GenericUDAFResolver2.. no ?
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:866 I am 
wondering if this function can be rewritten as following:

  WindowFunctionInfo finfo = windowFunctions.get(name.toLowerCase());
  if (finfo == null) { return null;}
  f ( !name.toLowerCase().equals(LEAD_FUNC_NAME) 
  !name.toLowerCase().equals(LAG_FUNC_NAME) ) {
  return getGenericUDAFEvaluator(name, argumentOIs, isDistinct, isAllColumns);
  }

  // this must be lead/lag UDAF
  GenericUDAFResolver udafResolver = finfo.getfInfo().getGenericUDAFResolver();
  GenericUDAFParameterInfo paramInfo = new SimpleGenericUDAFParameterInfo(
   argumentOIs.toArray(), isDistinct, isAllColumns);
  return  ((GenericUDAFResolver2) udafResolver).getEvaluator(paramInfo);

  If not, than I have specific questions. See next few comments.
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:1467 It will 
be good to comments for this new boolean registerAsUDAF. Something like 
following
  There are certain UDAFs like lead/lag which we want as windowing functions, 
but don't want them to appear in mFunctions.
  Why?
  Because
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java:29 This 
and GenericUDAFLead share lots of common code. It might be good to have an 
abstract class for these two, just the way you have it in GenericUDFLeadLag.
  ql/src/test/queries/clientpositive/leadlag_queries.q:20 I think currently we 
don't support over clause on expressions. Once we do, it will be good to add 
test like:
  select p_retailprice - lead (p_retail) over (partition by p_mfgr) from part;
  ql/src/test/queries/clientpositive/leadlag_queries.q:35 It will be good to 
add a test which has both lead and lag in same query.

REVISION DETAIL
  https://reviews.facebook.net/D8961

BRANCH
  HIVE-4080

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, hbutani


 Add Lead  Lag UDAFs
 

 Key: HIVE-4080
 URL: https://issues.apache.org/jira/browse/HIVE-4080
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch


 Currently we support Lead/Lag as navigation UDFs usable with Windowing.
 To be standard compliant we need to support Lead  Lag UDAFs.
 Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing 
 is in play. 
 Currently allow Lead/Lag expressions to appear in SelectLists even when they 
 are not arguments to UDAFs. Support for this feature will probably be 
 removed. Causes ambiguities when Query contains different partition clauses. 
 Will provide more details with associated Jira to remove this feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs

2013-02-27 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588560#comment-13588560
 ] 

Harish Butani commented on HIVE-4080:
-

Yes, there are several related issues:

1. Lead/Lag as UDAFs
- this jira only addresses this
- will work on your comments.

2. Support expressions with over clause
- filed JIRA 4081 for this
- will work on this next

3. Support for lead/lag UDFs. Based on our offline conversation and as you 
point out here the options are:
- should we continue to support
- should we completely remove support?
- support lead/lag as UDFs, but only within argument expressions of other UDAFs.
The consensus seems to be option 3 is nice to have; 1 is problematic. 
Will address this in a separate JIRA

4. The notion of default partitions
- you have given more proof, why supporting lead/lag as UDFs generally (option 
1) is problematic.
- in general, should we continue to support this?
- Your approach, makes sense
Will address this in separate JIRA.

Does this break down of issues make sense? Will address the first 3 asap; and 
then work on supporting multiple partitions(4041). The 4th one will have to 
wait a bit.

 Add Lead  Lag UDAFs
 

 Key: HIVE-4080
 URL: https://issues.apache.org/jira/browse/HIVE-4080
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch


 Currently we support Lead/Lag as navigation UDFs usable with Windowing.
 To be standard compliant we need to support Lead  Lag UDAFs.
 Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing 
 is in play. 
 Currently allow Lead/Lag expressions to appear in SelectLists even when they 
 are not arguments to UDAFs. Support for this feature will probably be 
 removed. Causes ambiguities when Query contains different partition clauses. 
 Will provide more details with associated Jira to remove this feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs

2013-02-27 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588612#comment-13588612
 ] 

Phabricator commented on HIVE-4080:
---

hbutani has commented on the revision HIVE-4080 [jira] Add Lead  Lag UDAFs.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:866 Yes, i 
wanted to introduce a new fn to take a GenericUDAFResolver: public static 
GenericUDAFEvaluator getGenericUDAFEvaluator(GenericUDAFResolver,
ListObjectInspector argumentOIs, boolean isDistinct,
boolean isAllColumns)

  and have the current getGenericUDAFEvaluator and getGenericWindowingEvaluator 
call it.

  But backed out, because was not comfortable making this change and submitting 
the patch w/o running the entire test suite.

  Ended up just doing a cut and paste. Your soln is much better
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:1467 Yes, 
meant to do this. Somehow forgot, sorry
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java:29 yes, 
i was rushing this...
  should refactor it.
  ql/src/test/queries/clientpositive/leadlag_queries.q:20 yes exactly
  ql/src/test/queries/clientpositive/leadlag_queries.q:35 will add

REVISION DETAIL
  https://reviews.facebook.net/D8961

BRANCH
  HIVE-4080

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, hbutani


 Add Lead  Lag UDAFs
 

 Key: HIVE-4080
 URL: https://issues.apache.org/jira/browse/HIVE-4080
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch


 Currently we support Lead/Lag as navigation UDFs usable with Windowing.
 To be standard compliant we need to support Lead  Lag UDAFs.
 Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing 
 is in play. 
 Currently allow Lead/Lag expressions to appear in SelectLists even when they 
 are not arguments to UDAFs. Support for this feature will probably be 
 removed. Causes ambiguities when Query contains different partition clauses. 
 Will provide more details with associated Jira to remove this feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs

2013-02-27 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589076#comment-13589076
 ] 

Phabricator commented on HIVE-4080:
---

ashutoshc has requested changes to the revision HIVE-4080 [jira] Add Lead  
Lag UDAFs.

  Mostly looks good.
  * Missing apache headers.
  * Request for couple more tests.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLead.java:1 
Apache headers are missing.
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLeadLag.java:1 
Apache headers missing.
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLag.java:1 
Apache headers missing.
  ql/src/test/queries/clientpositive/leadlag_queries.q:19 So, lead/lag can now 
take three params. First one is column, second one is offset, third is default 
value. Correct?
  It will be good to have a test case with a constant default like lag 
(price,5,10) and one with implicit params like lag (price) which implies 
default of offset = 1 and NULL for default.

REVISION DETAIL
  https://reviews.facebook.net/D8961

BRANCH
  HIVE-4080

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, hbutani


 Add Lead  Lag UDAFs
 

 Key: HIVE-4080
 URL: https://issues.apache.org/jira/browse/HIVE-4080
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch, 
 HIVE-4080.D8961.2.patch


 Currently we support Lead/Lag as navigation UDFs usable with Windowing.
 To be standard compliant we need to support Lead  Lag UDAFs.
 Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing 
 is in play. 
 Currently allow Lead/Lag expressions to appear in SelectLists even when they 
 are not arguments to UDAFs. Support for this feature will probably be 
 removed. Causes ambiguities when Query contains different partition clauses. 
 Will provide more details with associated Jira to remove this feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4080) Add Lead Lag UDAFs

2013-02-26 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588041#comment-13588041
 ] 

Harish Butani commented on HIVE-4080:
-

patch attached

 Add Lead  Lag UDAFs
 

 Key: HIVE-4080
 URL: https://issues.apache.org/jira/browse/HIVE-4080
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4080.1.patch.txt, HIVE-4080.D8961.1.patch


 Currently we support Lead/Lag as navigation UDFs usable with Windowing.
 To be standard compliant we need to support Lead  Lag UDAFs.
 Will continue to support Lead/Lag UDFs as arguments to UDAFs when Windowing 
 is in play. 
 Currently allow Lead/Lag expressions to appear in SelectLists even when they 
 are not arguments to UDAFs. Support for this feature will probably be 
 removed. Causes ambiguities when Query contains different partition clauses. 
 Will provide more details with associated Jira to remove this feature.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira