[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299962#comment-14299962 ] Arun Gurumurthi commented on HIVE-3552: --- Hi Namit, This functionality is great. Will there be further enhancement to allow following : a) Rollup Cube to allow format such as - current format : group by a, b, c with rollup New formats : a) group by rollup(a, b, c) -- this will give output same as current format b) group by rollup((a, b), c) -- this will give output as (a,b,c) / (a,b) / total c) group by a rollup((b, c), d) -- this will give output as (a, b, c, d) / (a,b,c) , a similar functionality with CUBE These allow us to use rollup instead of specifying to many combinations with grouping sets when we do not want all combinations but only selective but it is still too many to specify in grouping sets. This functionality is available in RDBMS. Another request : If we were to use cube and filter only the sets we need instead of all combinations, i was trying to use groouping__ID to filter only specific sets and it does not return any output. select a,b,c, grouping__ID, count(*) from tableA group by a,b,c, with cube having grouping__ID = 2 ; Thanks Arun HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.11.0 Attachments: hive.3552.1.patch, hive.3552.10.patch, hive.3552.11.patch, hive.3552.12.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878542#comment-13878542 ] Lefty Leverenz commented on HIVE-3552: -- This adds hive.new.job.grouping.set.cardinality to HiveConf.java and hive-default.xml.template. Also documented in the wiki, with a link to this JIRA ticket: [Query Execution |https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryExecution] (search for grouping). HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.11.0 Attachments: hive.3552.1.patch, hive.3552.10.patch, hive.3552.11.patch, hive.3552.12.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878581#comment-13878581 ] Lefty Leverenz commented on HIVE-3552: -- The main wikidoc for this is here: * [Enhanced Aggregation, Cube, Grouping and Rollup |https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup] HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.11.0 Attachments: hive.3552.1.patch, hive.3552.10.patch, hive.3552.11.patch, hive.3552.12.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694416#comment-13694416 ] Irwin commented on HIVE-3552: - I have tested for cubes and rollups, but failed. My table is:t1,formatted followes: The error message is: I have tried to use hive-0.10.0 and hive-0.11.0, and the error is same. Why I cannot use Enhanced Aggregation, Cube, Grouping and Rollup? Any one help? thanks! HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.11.0 Attachments: hive.3552.10.patch, hive.3552.11.patch, hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549423#comment-13549423 ] Hudson commented on HIVE-3552: -- Integrated in Hive-trunk-h0.21 #1904 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1904/]) HIVE-3552. performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys. (Revision 1430979) Result = SUCCESS kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1430979 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/data/files/grouping_sets1.txt * /hive/trunk/data/files/grouping_sets2.txt * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets6.q * /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets7.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets2.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets3.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets4.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets5.q * /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets6.q.out * /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets7.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets2.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets4.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets5.q.out * /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3552.10.patch, hive.3552.11.patch, hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549459#comment-13549459 ] Hudson commented on HIVE-3552: -- Integrated in Hive-trunk-hadoop2 #56 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/56/]) HIVE-3552. performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys. (Revision 1430979) Result = FAILURE kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1430979 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/data/files/grouping_sets1.txt * /hive/trunk/data/files/grouping_sets2.txt * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets6.q * /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets7.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets2.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets3.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets4.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets5.q * /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets6.q.out * /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets7.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets2.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets4.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets5.q.out * /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3552.10.patch, hive.3552.11.patch, hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548842#comment-13548842 ] Hudson commented on HIVE-3552: -- Integrated in hive-trunk-hadoop1 #4 (See [https://builds.apache.org/job/hive-trunk-hadoop1/4/]) HIVE-3552. performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys. (Revision 1430979) Result = ABORTED kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1430979 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/conf/hive-default.xml.template * /hive/trunk/data/files/grouping_sets1.txt * /hive/trunk/data/files/grouping_sets2.txt * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java * /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets6.q * /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets7.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets2.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets3.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets4.q * /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets5.q * /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets6.q.out * /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets7.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets2.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets4.q.out * /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets5.q.out * /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml * /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3552.10.patch, hive.3552.11.patch, hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547083#comment-13547083 ] Kevin Wilfong commented on HIVE-3552: - +1 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3552.10.patch, hive.3552.11.patch, hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13542382#comment-13542382 ] Kevin Wilfong commented on HIVE-3552: - The patch needs to be updated, it's not applying cleanly. Some minor style comments on Phabricator. HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3552.10.patch, hive.3552.11.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537462#comment-13537462 ] Kevin Wilfong commented on HIVE-3552: - A few more comments on Phabricator. HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3552.10.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536762#comment-13536762 ] Namit Jain commented on HIVE-3552: -- refreshed and attached latest patch. HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3552.10.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534273#comment-13534273 ] Kevin Wilfong commented on HIVE-3552: - Add a couple comments on Phabricator. HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530021#comment-13530021 ] Namit Jain commented on HIVE-3552: -- comments addressed + tests passed HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys
[ https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508575#comment-13508575 ] Namit Jain commented on HIVE-3552: -- tests passed HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys - Key: HIVE-3552 URL: https://issues.apache.org/jira/browse/HIVE-3552 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3552.1.patch This is a follow up for HIVE-3433. Had a offline discussion with Sambavi - she pointed out a scenario where the implementation in HIVE-3433 will not scale. Assume that the user is performing a cube on many columns, say '8' columns. So, each row would generate 256 rows for the hash table, which may kill the current group by implementation. A better implementation would be to add an additional mr job - in the first mr job perform the group by assuming there was no cube. Add another mr job, where you would perform the cube. The assumption is that the group by would have decreased the output data significantly, and the rows would appear in the order of grouping keys which has a higher probability of hitting the hash table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira