[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2015-01-27 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7158:
-
Labels:   (was: TODOC14)

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: New Feature
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.14.0

 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, 
 HIVE-7158.4.patch, HIVE-7158.5.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-11-08 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-
Issue Type: New Feature  (was: Bug)

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: New Feature
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, 
 HIVE-7158.4.patch, HIVE-7158.5.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-06-13 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7158:
-

Labels: TODOC14  (was: )

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, 
 HIVE-7158.4.patch, HIVE-7158.5.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-06-12 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks [~vikram.dixit], [~gopalv], [~sseth], [~leftylev], 
and [~bikassaha]!

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.14.0

 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, 
 HIVE-7158.4.patch, HIVE-7158.5.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-06-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-

Status: Open  (was: Patch Available)

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-06-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-

Attachment: HIVE-7158.4.patch

.4 sets the lower bound to Math.max(1, estimate * min_factor)

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, 
 HIVE-7158.4.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-06-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-

Status: Patch Available  (was: Open)

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, 
 HIVE-7158.4.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-06-10 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7158:
--

Attachment: HIVE-7158.5.patch

Incorporated [~leftylev]'s comments from RB  added punctuation for

When enabled Hive will to When enabled, Hive will.

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, 
 HIVE-7158.4.patch, HIVE-7158.5.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-06-09 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-

Status: Open  (was: Patch Available)

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-06-09 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-

Attachment: HIVE-7158.3.patch

.3 addresses review comments

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-06-09 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-

Status: Patch Available  (was: Open)

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-05-31 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-

Status: Patch Available  (was: Open)

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7158.1.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-05-31 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-

Attachment: HIVE-7158.1.patch

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7158.1.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-05-31 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-

Status: Open  (was: Patch Available)

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7158.1.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-05-31 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-

Attachment: HIVE-7158.2.patch

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7158) Use Tez auto-parallelism in Hive

2014-05-31 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7158:
-

Status: Patch Available  (was: Open)

 Use Tez auto-parallelism in Hive
 

 Key: HIVE-7158
 URL: https://issues.apache.org/jira/browse/HIVE-7158
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch


 Tez can optionally sample data from a fraction of the tasks of a vertex and 
 use that information to choose the number of downstream tasks for any given 
 scatter gather edge.
 Hive estimates the count of reducers by looking at stats and estimates for 
 each operator in the operator pipeline leading up to the reducer. However, if 
 this estimate turns out to be too large, Tez can reign in the resources used 
 to compute the reducer.
 It does so by combining partitions of the upstream vertex. It cannot, 
 however, add reducers at this stage.
 I'm proposing to let users specify whether they want to use auto-parallelism 
 or not. If they do there will be scaling factors to determine max and min 
 reducers Tez can choose from. We will then partition by max reducers, letting 
 Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.2#6252)