[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3552:
---

Fix Version/s: 0.11.0

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.11.0
>
> Attachments: hive.3552.10.patch, hive.3552.11.patch, 
> hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
> hive.3552.8.patch, hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-09 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3552:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed, thanks Namit.

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.11.patch, 
> hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
> hive.3552.8.patch, hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Status: Patch Available  (was: Open)

comments addressed

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.11.patch, 
> hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
> hive.3552.8.patch, hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.12.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.11.patch, 
> hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
> hive.3552.8.patch, hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.12.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.11.patch, 
> hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, 
> hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, 
> hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: (was: hive.3552.12.patch)

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.11.patch, 
> hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, 
> hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, 
> hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-02 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3552:


Status: Open  (was: Patch Available)

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.11.patch, 
> hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, 
> hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, 
> hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-21 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.11.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.11.patch, 
> hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, 
> hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, 
> hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-21 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Status: Patch Available  (was: Open)

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.11.patch, 
> hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, 
> hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, 
> hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-20 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3552:


Status: Open  (was: Patch Available)

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.1.patch, 
> hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, 
> hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-19 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.10.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.10.patch, hive.3552.1.patch, 
> hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, 
> hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-19 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.9.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
> hive.3552.8.patch, hive.3552.9.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.8.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
> hive.3552.8.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.7.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.6.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Status: Patch Available  (was: Open)

comments addressed

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch, hive.3552.5.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.5.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch, hive.3552.5.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-17 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3552:


Status: Open  (was: Patch Available)

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.4.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
> hive.3552.4.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.3.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-05 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.2.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch, hive.3552.2.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-11-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Description: 
This is a follow up for HIVE-3433.

Had a offline discussion with Sambavi - she pointed out a scenario where the
implementation in HIVE-3433 will not scale. Assume that the user is performing
a cube on many columns, say '8' columns. So, each row would generate 256 rows
for the hash table, which may kill the current group by implementation.

A better implementation would be to add an additional mr job - in the first 
mr job perform the group by assuming there was no cube. Add another mr job, 
where
you would perform the cube. The assumption is that the group by would have 
decreased the output data significantly, and the rows would appear in the order 
of
grouping keys which has a higher probability of hitting the hash table.

  was:
This is a follow up for HIVE-3433.

Had a offline discussion with Sambavi - she pointed out a scenario where the
implementation in HIVE-3433 will not scale. Assume that the user is performing
a cube on many columns, say '8' columns. So, each row would generate 256 rows
for the hash table, which may kill the current group by implementation.

A better implementation would be to add an additional stage - in the first 
stage perform the group by assuming there was no cube. Ad another stage, where
you would perform the cube. The assumption is that the group by would have 
decreased the output data significantly.


> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional mr job - in the first 
> mr job perform the group by assuming there was no cube. Add another mr job, 
> where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly, and the rows would appear in the 
> order of
> grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-11-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.1.patch

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional stage - in the first 
> stage perform the group by assuming there was no cube. Ad another stage, where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-11-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Status: Patch Available  (was: Open)

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3552.1.patch
>
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional stage - in the first 
> stage perform the group by assuming there was no cube. Ad another stage, where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-11-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Summary: HIVE-3552 performant manner for performing cubes/rollups/grouping 
sets for a high number of grouping set keys  (was: performant manner for 
performing cubes and rollups in case of less aggregation)

> HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
> high number of grouping set keys
> -
>
> Key: HIVE-3552
> URL: https://issues.apache.org/jira/browse/HIVE-3552
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>
> This is a follow up for HIVE-3433.
> Had a offline discussion with Sambavi - she pointed out a scenario where the
> implementation in HIVE-3433 will not scale. Assume that the user is performing
> a cube on many columns, say '8' columns. So, each row would generate 256 rows
> for the hash table, which may kill the current group by implementation.
> A better implementation would be to add an additional stage - in the first 
> stage perform the group by assuming there was no cube. Ad another stage, where
> you would perform the cube. The assumption is that the group by would have 
> decreased the output data significantly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira