Review Request 21168: HIVE-6999: Add streaming mode to PTFs

2014-05-15 Thread Harish Butani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21168/
---

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-6999
https://issues.apache.org/jira/browse/HIVE-6999


Repository: hive-git


Description
---

There are a set of use cases where the Table Function can operate on a 
Partition row by row or on a subset(window) of rows as it is being streamed to 
it.
Windowing has couple of use cases of this:processing of Rank functions, 
processing of Window Aggregations.
But this is a generic concept: any analysis that operates on an Ordered 
partition maybe able to operate in Streaming mode.
This patch introduces streaming mode in PTFs and provides the mechanics to 
handle PTF chains that contain both modes of PTFs.
Subsequent patches will introduce Streaming mode for Windowing.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3bb8fa9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 4d314b7 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java 34aebf0 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 
1087bbf 
  ql/src/test/queries/clientpositive/ptf_streaming.q PRE-CREATION 
  ql/src/test/results/clientpositive/ptf_streaming.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/21168/diff/


Testing
---

added new tests


Thanks,

Harish Butani



Re: Review Request 21168: HIVE-6999: Add streaming mode to PTFs

2014-05-15 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21168/#review42578
---


Design looks good. Few implementation related comments.


ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java


Instead of adding noop functions statically, we should put these functions 
in test/ package and than do add jar for testing. Multiple reasons:
* Better to isolate test code from production code.
* It will also exercise add jar functionality for PTF functions for which I 
am not sure we have coverage.
* These functions also show up in default list of inbuilt functions. It may 
confuse user to wonder what good these functions are for. show_functions.q 
failed because of this.



ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java


Can you add a comment why we need to keep track for first row processed in 
Map?



ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java


Nice helpful comments!



ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java


Better name : outputPartRowsItr?



ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java


This doesnt feel right. Why do we need to special handle no-op functions? 
If there is a good reason for it, we should document in comment here.



ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java


Better placed in test package? Also, ASF headers are missing.



ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java


Better placed in test package? Also, ASF headers are missing.



ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java


Nice helpful comments!



ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java


Comment made sense. Since like those fields are not present in class any 
more. Shall we just get rid of this?



ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java


Better name: canAcceptInputAsStream?


- Ashutosh Chauhan


On May 7, 2014, 6:11 p.m., Harish Butani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21168/
> ---
> 
> (Updated May 7, 2014, 6:11 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-6999
> https://issues.apache.org/jira/browse/HIVE-6999
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> There are a set of use cases where the Table Function can operate on a 
> Partition row by row or on a subset(window) of rows as it is being streamed 
> to it.
> Windowing has couple of use cases of this:processing of Rank functions, 
> processing of Window Aggregations.
> But this is a generic concept: any analysis that operates on an Ordered 
> partition maybe able to operate in Streaming mode.
> This patch introduces streaming mode in PTFs and provides the mechanics to 
> handle PTF chains that contain both modes of PTFs.
> Subsequent patches will introduce Streaming mode for Windowing.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3bb8fa9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 4d314b7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java 34aebf0 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 
> 1087bbf 
>   ql/src/test/queries/clientpositive/ptf_streaming.q PRE-CREATION 
>   ql/src/test/results/clientpositive/ptf_streaming.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21168/diff/
> 
> 
> Testing
> ---
> 
> added new tests
> 
> 
> Thanks,
> 
> Harish Butani
> 
>



Re: Review Request 21168: HIVE-6999: Add streaming mode to PTFs

2014-05-15 Thread Harish Butani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21168/
---

(Updated May 14, 2014, 9:21 p.m.)


Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-6999
https://issues.apache.org/jira/browse/HIVE-6999


Repository: hive-git


Description
---

There are a set of use cases where the Table Function can operate on a 
Partition row by row or on a subset(window) of rows as it is being streamed to 
it.
Windowing has couple of use cases of this:processing of Rank functions, 
processing of Window Aggregations.
But this is a generic concept: any analysis that operates on an Ordered 
partition maybe able to operate in Streaming mode.
This patch introduces streaming mode in PTFs and provides the mechanics to 
handle PTF chains that contain both modes of PTFs.
Subsequent patches will introduce Streaming mode for Windowing.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3bb8fa9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 4d314b7 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java 34aebf0 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 
1087bbf 
  ql/src/test/queries/clientpositive/ptf_streaming.q PRE-CREATION 
  ql/src/test/results/clientpositive/ptf_streaming.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/21168/diff/


Testing
---

added new tests


Thanks,

Harish Butani



Re: Review Request 21168: HIVE-6999: Add streaming mode to PTFs

2014-05-15 Thread Harish Butani


> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java, line 58
> > 
> >
> > Can you add a comment why we need to keep track for first row processed 
> > in Map?

done


> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java, line 319
> > 
> >
> > Better name : outputPartRowsItr?

done


> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java, 
> > line 96
> > 
> >
> > Comment made sense. Since like those fields are not present in class 
> > any more. Shall we just get rid of this?

this is needed; transient based on BeanInfo (get/set methods) in class


> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java, 
> > line 218
> > 
> >
> > Better name: canAcceptInputAsStream?

done


> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 449
> > 
> >
> > Instead of adding noop functions statically, we should put these 
> > functions in test/ package and than do add jar for testing. Multiple 
> > reasons:
> > * Better to isolate test code from production code.
> > * It will also exercise add jar functionality for PTF functions for 
> > which I am not sure we have coverage.
> > * These functions also show up in default list of inbuilt functions. It 
> > may confuse user to wonder what good these functions are for. 
> > show_functions.q failed because of this.

Agree with your comments on Noop. This was done because for testing we need a 
PTF and Noop has some special short circuit path for Partition handling.
But can we do this as a separate Jira; removing references to Noop in the 
translation code is non trivial work.


- Harish


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21168/#review42578
---


On May 14, 2014, 9:21 p.m., Harish Butani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21168/
> ---
> 
> (Updated May 14, 2014, 9:21 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-6999
> https://issues.apache.org/jira/browse/HIVE-6999
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> There are a set of use cases where the Table Function can operate on a 
> Partition row by row or on a subset(window) of rows as it is being streamed 
> to it.
> Windowing has couple of use cases of this:processing of Rank functions, 
> processing of Window Aggregations.
> But this is a generic concept: any analysis that operates on an Ordered 
> partition maybe able to operate in Streaming mode.
> This patch introduces streaming mode in PTFs and provides the mechanics to 
> handle PTF chains that contain both modes of PTFs.
> Subsequent patches will introduce Streaming mode for Windowing.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3bb8fa9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 4d314b7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java 34aebf0 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 
> 1087bbf 
>   ql/src/test/queries/clientpositive/ptf_streaming.q PRE-CREATION 
>   ql/src/test/results/clientpositive/ptf_streaming.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21168/diff/
> 
> 
> Testing
> ---
> 
> added new tests
> 
> 
> Thanks,
> 
> Harish Butani
> 
>