[jira] [Commented] (HIVE-4358) Check for Map side processing in PTFOp is no longer valid
[ https://issues.apache.org/jira/browse/HIVE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13642864#comment-13642864 ] Hudson commented on HIVE-4358: -- Integrated in Hive-trunk-hadoop2 #175 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/175/]) HIVE-4358 : Check for Map side processing in PTFOp is no longer valid (Harish Butani via Ashutosh Chauhan) (Revision 1475880) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1475880 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java * /hive/trunk/ql/src/test/queries/clientpositive/windowing.q * /hive/trunk/ql/src/test/results/clientpositive/windowing.q.out Check for Map side processing in PTFOp is no longer valid - Key: HIVE-4358 URL: https://issues.apache.org/jira/browse/HIVE-4358 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.12.0 Attachments: HIVE-4358.D10473.1.patch With the changes for ReduceSinkDedup it is no longer true that a non Map-side PTF Operator is preceded by an ExtractOp. For e.g. following query can produce the issue: {noformat} create view IF NOT EXISTS mfgr_price_view as select p_mfgr, p_brand, sum(p_retailprice) as s from part group by p_mfgr, p_brand; select p_mfgr, p_brand, s, sum(s) over w1 as s1 from mfgr_price_view window w1 as (distribute by p_mfgr sort by p_brand rows between 2 preceding and current row); {noformat} Fix is to add an explicit flag to PTFDesc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4358) Check for Map side processing in PTFOp is no longer valid
[ https://issues.apache.org/jira/browse/HIVE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643367#comment-13643367 ] Hudson commented on HIVE-4358: -- Integrated in Hive-trunk-h0.21 #2079 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2079/]) HIVE-4358 : Check for Map side processing in PTFOp is no longer valid (Harish Butani via Ashutosh Chauhan) (Revision 1475880) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1475880 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java * /hive/trunk/ql/src/test/queries/clientpositive/windowing.q * /hive/trunk/ql/src/test/results/clientpositive/windowing.q.out Check for Map side processing in PTFOp is no longer valid - Key: HIVE-4358 URL: https://issues.apache.org/jira/browse/HIVE-4358 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.12.0 Attachments: HIVE-4358.D10473.1.patch With the changes for ReduceSinkDedup it is no longer true that a non Map-side PTF Operator is preceded by an ExtractOp. For e.g. following query can produce the issue: {noformat} create view IF NOT EXISTS mfgr_price_view as select p_mfgr, p_brand, sum(p_retailprice) as s from part group by p_mfgr, p_brand; select p_mfgr, p_brand, s, sum(s) over w1 as s1 from mfgr_price_view window w1 as (distribute by p_mfgr sort by p_brand rows between 2 preceding and current row); {noformat} Fix is to add an explicit flag to PTFDesc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4358) Check for Map side processing in PTFOp is no longer valid
[ https://issues.apache.org/jira/browse/HIVE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641500#comment-13641500 ] Phabricator commented on HIVE-4358: --- ashutoshc has accepted the revision HIVE-4358 [jira] Check for Map side processing in PTFOp is no longer valid. +1 REVISION DETAIL https://reviews.facebook.net/D10473 BRANCH HIVE-4358 ARCANIST PROJECT hive To: JIRA, ashutoshc, hbutani Check for Map side processing in PTFOp is no longer valid - Key: HIVE-4358 URL: https://issues.apache.org/jira/browse/HIVE-4358 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-4358.D10473.1.patch With the changes for ReduceSinkDedup it is no longer true that a non Map-side PTF Operator is preceded by an ExtractOp. For e.g. following query can produce the issue: {noformat} create view IF NOT EXISTS mfgr_price_view as select p_mfgr, p_brand, sum(p_retailprice) as s from part group by p_mfgr, p_brand; select p_mfgr, p_brand, s, sum(s) over w1 as s1 from mfgr_price_view window w1 as (distribute by p_mfgr sort by p_brand rows between 2 preceding and current row); {noformat} Fix is to add an explicit flag to PTFDesc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4358) Check for Map side processing in PTFOp is no longer valid
[ https://issues.apache.org/jira/browse/HIVE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641269#comment-13641269 ] Harish Butani commented on HIVE-4358: - [~ashutoshc] can you please review this when you get a chance. Should try to get this into 0.11; the ReduceDedup optimization can cause the PTFOp to throw an error. Check for Map side processing in PTFOp is no longer valid - Key: HIVE-4358 URL: https://issues.apache.org/jira/browse/HIVE-4358 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Attachments: HIVE-4358.D10473.1.patch With the changes for ReduceSinkDedup it is no longer true that a non Map-side PTF Operator is preceded by an ExtractOp. For e.g. following query can produce the issue: {noformat} create view IF NOT EXISTS mfgr_price_view as select p_mfgr, p_brand, sum(p_retailprice) as s from part group by p_mfgr, p_brand; select p_mfgr, p_brand, s, sum(s) over w1 as s1 from mfgr_price_view window w1 as (distribute by p_mfgr sort by p_brand rows between 2 preceding and current row); {noformat} Fix is to add an explicit flag to PTFDesc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4358) Check for Map side processing in PTFOp is no longer valid
[ https://issues.apache.org/jira/browse/HIVE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641328#comment-13641328 ] Ashutosh Chauhan commented on HIVE-4358: Ya, I was thinking about it. Question I wanted to know answer for is: whats the need for this flag? Why do we need to do different things depending on whether we are in mapper or reducer? Looking into it, looks like: * in geting serde for the partition. * in determining outputOI. * in close(). = this I understand the reason * in processOp() = this I understand too I don't understand how for first two points, how running in map or reduce makes a difference. No other operator does this. Is it implementation artifact or there is a fundamental reason for this? Check for Map side processing in PTFOp is no longer valid - Key: HIVE-4358 URL: https://issues.apache.org/jira/browse/HIVE-4358 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Attachments: HIVE-4358.D10473.1.patch With the changes for ReduceSinkDedup it is no longer true that a non Map-side PTF Operator is preceded by an ExtractOp. For e.g. following query can produce the issue: {noformat} create view IF NOT EXISTS mfgr_price_view as select p_mfgr, p_brand, sum(p_retailprice) as s from part group by p_mfgr, p_brand; select p_mfgr, p_brand, s, sum(s) over w1 as s1 from mfgr_price_view window w1 as (distribute by p_mfgr sort by p_brand rows between 2 preceding and current row); {noformat} Fix is to add an explicit flag to PTFDesc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4358) Check for Map side processing in PTFOp is no longer valid
[ https://issues.apache.org/jira/browse/HIVE-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641355#comment-13641355 ] Harish Butani commented on HIVE-4358: - This is to allow a PTF to process the raw input before partitioning has happened. A good example is how to perform CandidateFrequentItemSet computation. The input is a Basket(basketId, productId) table; output is a list of Itemsets that are frequently brought together; frequent is defined by a threshold parameter. The output has the form Itemset(ArrayString itemset), assuming ProductId is String. The way you compute this is to apply a FrequentItemSet algorithm on subsets of the input in parallel. So in our prototype we implemented the DynamicItemCounting algorithm. This got executed in each mapper; the output was a Candidate Itemset(ArrayString itemset, count) from each mapper. The reducer than summed counts across all mappers and checked for thresholds. But from a calling perspective it still appears like a PTF invocation to a caller: select itemset from candidateFreqItemSets(on basket partition by itemset) Behind the scenes we create a Plan with a PTFOp for the Map-side where the DynamicItemCounting is done; and a PTFOp on the reduce side where the aggregation is done. Hope this makes sense; i realize it is very brief, can go over it in detail with you. Check for Map side processing in PTFOp is no longer valid - Key: HIVE-4358 URL: https://issues.apache.org/jira/browse/HIVE-4358 Project: Hive Issue Type: Bug Components: PTF-Windowing Reporter: Harish Butani Attachments: HIVE-4358.D10473.1.patch With the changes for ReduceSinkDedup it is no longer true that a non Map-side PTF Operator is preceded by an ExtractOp. For e.g. following query can produce the issue: {noformat} create view IF NOT EXISTS mfgr_price_view as select p_mfgr, p_brand, sum(p_retailprice) as s from part group by p_mfgr, p_brand; select p_mfgr, p_brand, s, sum(s) over w1 as s1 from mfgr_price_view window w1 as (distribute by p_mfgr sort by p_brand rows between 2 preceding and current row); {noformat} Fix is to add an explicit flag to PTFDesc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira