[jira] [Commented] (HIVE-4358) Check for Map side processing in PTFOp is no longer valid

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642864#comment-13642864
 ] 

Hudson commented on HIVE-4358:
--

Integrated in Hive-trunk-hadoop2 #175 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/175/])
HIVE-4358 : Check for Map side processing in PTFOp is no longer valid 
(Harish Butani via Ashutosh Chauhan) (Revision 1475880)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1475880
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/windowing.q
* /hive/trunk/ql/src/test/results/clientpositive/windowing.q.out


 Check for Map side processing in PTFOp is no longer valid
 -

 Key: HIVE-4358
 URL: https://issues.apache.org/jira/browse/HIVE-4358
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.12.0

 Attachments: HIVE-4358.D10473.1.patch


 With the changes for ReduceSinkDedup it is no longer true that a non Map-side 
 PTF Operator is preceded by an ExtractOp. For e.g. following query can 
 produce the issue:
 {noformat}
 create view IF NOT EXISTS mfgr_price_view as 
 select p_mfgr, p_brand, 
 sum(p_retailprice) as s 
 from part 
 group by p_mfgr, p_brand;
 
 select p_mfgr, p_brand, s, 
 sum(s) over w1  as s1
 from mfgr_price_view 
 window w1 as (distribute by p_mfgr sort by p_brand rows between 2 preceding 
 and current row);
 {noformat}
 Fix is to add an explicit flag to PTFDesc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4358) Check for Map side processing in PTFOp is no longer valid

2013-04-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643367#comment-13643367
 ] 

Hudson commented on HIVE-4358:
--

Integrated in Hive-trunk-h0.21 #2079 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2079/])
HIVE-4358 : Check for Map side processing in PTFOp is no longer valid 
(Harish Butani via Ashutosh Chauhan) (Revision 1475880)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1475880
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/windowing.q
* /hive/trunk/ql/src/test/results/clientpositive/windowing.q.out


 Check for Map side processing in PTFOp is no longer valid
 -

 Key: HIVE-4358
 URL: https://issues.apache.org/jira/browse/HIVE-4358
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.12.0

 Attachments: HIVE-4358.D10473.1.patch


 With the changes for ReduceSinkDedup it is no longer true that a non Map-side 
 PTF Operator is preceded by an ExtractOp. For e.g. following query can 
 produce the issue:
 {noformat}
 create view IF NOT EXISTS mfgr_price_view as 
 select p_mfgr, p_brand, 
 sum(p_retailprice) as s 
 from part 
 group by p_mfgr, p_brand;
 
 select p_mfgr, p_brand, s, 
 sum(s) over w1  as s1
 from mfgr_price_view 
 window w1 as (distribute by p_mfgr sort by p_brand rows between 2 preceding 
 and current row);
 {noformat}
 Fix is to add an explicit flag to PTFDesc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4358) Check for Map side processing in PTFOp is no longer valid

2013-04-25 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641500#comment-13641500
 ] 

Phabricator commented on HIVE-4358:
---

ashutoshc has accepted the revision HIVE-4358 [jira] Check for Map side 
processing in PTFOp is no longer valid.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D10473

BRANCH
  HIVE-4358

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, hbutani


 Check for Map side processing in PTFOp is no longer valid
 -

 Key: HIVE-4358
 URL: https://issues.apache.org/jira/browse/HIVE-4358
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4358.D10473.1.patch


 With the changes for ReduceSinkDedup it is no longer true that a non Map-side 
 PTF Operator is preceded by an ExtractOp. For e.g. following query can 
 produce the issue:
 {noformat}
 create view IF NOT EXISTS mfgr_price_view as 
 select p_mfgr, p_brand, 
 sum(p_retailprice) as s 
 from part 
 group by p_mfgr, p_brand;
 
 select p_mfgr, p_brand, s, 
 sum(s) over w1  as s1
 from mfgr_price_view 
 window w1 as (distribute by p_mfgr sort by p_brand rows between 2 preceding 
 and current row);
 {noformat}
 Fix is to add an explicit flag to PTFDesc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4358) Check for Map side processing in PTFOp is no longer valid

2013-04-24 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641269#comment-13641269
 ] 

Harish Butani commented on HIVE-4358:
-

[~ashutoshc] can you please review this when you get a chance. 
Should try to get this into 0.11; the ReduceDedup optimization can cause the 
PTFOp to throw an error.

 Check for Map side processing in PTFOp is no longer valid
 -

 Key: HIVE-4358
 URL: https://issues.apache.org/jira/browse/HIVE-4358
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
 Attachments: HIVE-4358.D10473.1.patch


 With the changes for ReduceSinkDedup it is no longer true that a non Map-side 
 PTF Operator is preceded by an ExtractOp. For e.g. following query can 
 produce the issue:
 {noformat}
 create view IF NOT EXISTS mfgr_price_view as 
 select p_mfgr, p_brand, 
 sum(p_retailprice) as s 
 from part 
 group by p_mfgr, p_brand;
 
 select p_mfgr, p_brand, s, 
 sum(s) over w1  as s1
 from mfgr_price_view 
 window w1 as (distribute by p_mfgr sort by p_brand rows between 2 preceding 
 and current row);
 {noformat}
 Fix is to add an explicit flag to PTFDesc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4358) Check for Map side processing in PTFOp is no longer valid

2013-04-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641328#comment-13641328
 ] 

Ashutosh Chauhan commented on HIVE-4358:


Ya, I was thinking about it. Question I wanted to know answer for is: whats the 
need for this flag? Why do we need to do different things depending on whether 
we are in mapper or reducer? Looking into it, looks like:
* in geting serde for the partition.
* in determining outputOI.
* in close(). = this I understand the reason 
* in processOp() = this I understand too

I don't understand how for first two points, how running in map or reduce makes 
a difference. No other operator does this. Is it implementation artifact or 
there is a fundamental reason for this?

 Check for Map side processing in PTFOp is no longer valid
 -

 Key: HIVE-4358
 URL: https://issues.apache.org/jira/browse/HIVE-4358
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
 Attachments: HIVE-4358.D10473.1.patch


 With the changes for ReduceSinkDedup it is no longer true that a non Map-side 
 PTF Operator is preceded by an ExtractOp. For e.g. following query can 
 produce the issue:
 {noformat}
 create view IF NOT EXISTS mfgr_price_view as 
 select p_mfgr, p_brand, 
 sum(p_retailprice) as s 
 from part 
 group by p_mfgr, p_brand;
 
 select p_mfgr, p_brand, s, 
 sum(s) over w1  as s1
 from mfgr_price_view 
 window w1 as (distribute by p_mfgr sort by p_brand rows between 2 preceding 
 and current row);
 {noformat}
 Fix is to add an explicit flag to PTFDesc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4358) Check for Map side processing in PTFOp is no longer valid

2013-04-24 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641355#comment-13641355
 ] 

Harish Butani commented on HIVE-4358:
-

This is to allow a PTF to process the raw input before partitioning has 
happened.
A good example is how to perform CandidateFrequentItemSet computation. 
The input is a Basket(basketId, productId) table; output is a list of Itemsets 
that are frequently
brought together; frequent is defined by a threshold parameter.
The output has the form Itemset(ArrayString itemset), assuming ProductId is 
String.

The way you compute this is to apply a FrequentItemSet algorithm on subsets of 
the input in parallel.
So in our prototype we implemented the DynamicItemCounting algorithm. This got 
executed in each mapper;
the output was a Candidate Itemset(ArrayString itemset, count) from each 
mapper.
The reducer than summed counts across all mappers and checked for thresholds.

But from a calling perspective it still appears like a PTF invocation to a 
caller:

select itemset
from candidateFreqItemSets(on basket partition by itemset)

Behind the scenes we create a Plan with a PTFOp for the Map-side where the 
DynamicItemCounting is done; and a PTFOp on the reduce side where the 
aggregation is done. 

Hope this makes sense; i realize it is very brief, can go over it in detail 
with you.

 Check for Map side processing in PTFOp is no longer valid
 -

 Key: HIVE-4358
 URL: https://issues.apache.org/jira/browse/HIVE-4358
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
 Attachments: HIVE-4358.D10473.1.patch


 With the changes for ReduceSinkDedup it is no longer true that a non Map-side 
 PTF Operator is preceded by an ExtractOp. For e.g. following query can 
 produce the issue:
 {noformat}
 create view IF NOT EXISTS mfgr_price_view as 
 select p_mfgr, p_brand, 
 sum(p_retailprice) as s 
 from part 
 group by p_mfgr, p_brand;
 
 select p_mfgr, p_brand, s, 
 sum(s) over w1  as s1
 from mfgr_price_view 
 window w1 as (distribute by p_mfgr sort by p_brand rows between 2 preceding 
 and current row);
 {noformat}
 Fix is to add an explicit flag to PTFDesc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira