[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7158: - Labels: (was: TODOC14) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: New Feature Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.14.0 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, HIVE-7158.4.patch, HIVE-7158.5.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Issue Type: New Feature (was: Bug) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: New Feature Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, HIVE-7158.4.patch, HIVE-7158.5.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7158: - Labels: TODOC14 (was: ) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, HIVE-7158.4.patch, HIVE-7158.5.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks [~vikram.dixit], [~gopalv], [~sseth], [~leftylev], and [~bikassaha]! Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: 0.14.0 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, HIVE-7158.4.patch, HIVE-7158.5.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Status: Open (was: Patch Available) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Attachment: HIVE-7158.4.patch .4 sets the lower bound to Math.max(1, estimate * min_factor) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, HIVE-7158.4.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Status: Patch Available (was: Open) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, HIVE-7158.4.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-7158: -- Attachment: HIVE-7158.5.patch Incorporated [~leftylev]'s comments from RB added punctuation for When enabled Hive will to When enabled, Hive will. Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, HIVE-7158.4.patch, HIVE-7158.5.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Status: Open (was: Patch Available) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Attachment: HIVE-7158.3.patch .3 addresses review comments Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Status: Patch Available (was: Open) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Status: Patch Available (was: Open) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7158.1.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Attachment: HIVE-7158.1.patch Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7158.1.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Status: Open (was: Patch Available) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7158.1.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Attachment: HIVE-7158.2.patch Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive
[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7158: - Status: Patch Available (was: Open) Use Tez auto-parallelism in Hive Key: HIVE-7158 URL: https://issues.apache.org/jira/browse/HIVE-7158 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch Tez can optionally sample data from a fraction of the tasks of a vertex and use that information to choose the number of downstream tasks for any given scatter gather edge. Hive estimates the count of reducers by looking at stats and estimates for each operator in the operator pipeline leading up to the reducer. However, if this estimate turns out to be too large, Tez can reign in the resources used to compute the reducer. It does so by combining partitions of the upstream vertex. It cannot, however, add reducers at this stage. I'm proposing to let users specify whether they want to use auto-parallelism or not. If they do there will be scaling factors to determine max and min reducers Tez can choose from. We will then partition by max reducers, letting Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.2#6252)