[jira] [Updated] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max
[ https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-15881: --- Attachment: HIVE-15881.6.patch Patch rebased. > Use new thread count variable name instead of mapred.dfsclient.parallelism.max > -- > > Key: HIVE-15881 > URL: https://issues.apache.org/jira/browse/HIVE-15881 > Project: Hive > Issue Type: Task > Components: Query Planning >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Minor > Attachments: HIVE-15881.1.patch, HIVE-15881.2.patch, > HIVE-15881.3.patch, HIVE-15881.4.patch, HIVE-15881.5.patch, HIVE-15881.6.patch > > > The Utilities class has two methods, {{getInputSummary}} and > {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} > to get the summary of a list of input locations in parallel. These methods > are Hive related, but the variable name does not look it is specific for Hive. > Also, the above variable is not on HiveConf nor used anywhere else. I just > found a reference on the Hadoop MR1 code. > I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, > and use a different variable name, such as > {{hive.get.input.listing.num.threads}}, that reflects the intention of the > variable. The removal of the old variable might happen on Hive 3.x -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max
[ https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-15881: --- Attachment: HIVE-15881.5.patch New patch that added a new unit tests & use SizeValidator on the HiveConf variable to limit the number of threads (min: 0, max: 1024) > Use new thread count variable name instead of mapred.dfsclient.parallelism.max > -- > > Key: HIVE-15881 > URL: https://issues.apache.org/jira/browse/HIVE-15881 > Project: Hive > Issue Type: Task > Components: Query Planning >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Minor > Attachments: HIVE-15881.1.patch, HIVE-15881.2.patch, > HIVE-15881.3.patch, HIVE-15881.4.patch, HIVE-15881.5.patch > > > The Utilities class has two methods, {{getInputSummary}} and > {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} > to get the summary of a list of input locations in parallel. These methods > are Hive related, but the variable name does not look it is specific for Hive. > Also, the above variable is not on HiveConf nor used anywhere else. I just > found a reference on the Hadoop MR1 code. > I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, > and use a different variable name, such as > {{hive.get.input.listing.num.threads}}, that reflects the intention of the > variable. The removal of the old variable might happen on Hive 3.x -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max
[ https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-15881: --- Attachment: HIVE-15881.4.patch > Use new thread count variable name instead of mapred.dfsclient.parallelism.max > -- > > Key: HIVE-15881 > URL: https://issues.apache.org/jira/browse/HIVE-15881 > Project: Hive > Issue Type: Task > Components: Query Planning >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Minor > Attachments: HIVE-15881.1.patch, HIVE-15881.2.patch, > HIVE-15881.3.patch, HIVE-15881.4.patch > > > The Utilities class has two methods, {{getInputSummary}} and > {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} > to get the summary of a list of input locations in parallel. These methods > are Hive related, but the variable name does not look it is specific for Hive. > Also, the above variable is not on HiveConf nor used anywhere else. I just > found a reference on the Hadoop MR1 code. > I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, > and use a different variable name, such as > {{hive.get.input.listing.num.threads}}, that reflects the intention of the > variable. The removal of the old variable might happen on Hive 3.x -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max
[ https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-15881: --- Attachment: HIVE-15881.3.patch > Use new thread count variable name instead of mapred.dfsclient.parallelism.max > -- > > Key: HIVE-15881 > URL: https://issues.apache.org/jira/browse/HIVE-15881 > Project: Hive > Issue Type: Task > Components: Query Planning >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Minor > Attachments: HIVE-15881.1.patch, HIVE-15881.2.patch, > HIVE-15881.3.patch > > > The Utilities class has two methods, {{getInputSummary}} and > {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} > to get the summary of a list of input locations in parallel. These methods > are Hive related, but the variable name does not look it is specific for Hive. > Also, the above variable is not on HiveConf nor used anywhere else. I just > found a reference on the Hadoop MR1 code. > I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, > and use a different variable name, such as > {{hive.get.input.listing.num.threads}}, that reflects the intention of the > variable. The removal of the old variable might happen on Hive 3.x -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max
[ https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-15881: --- Attachment: HIVE-15881.2.patch [~poeppt] I attached a new patch here and on RB addressing your comments. > Use new thread count variable name instead of mapred.dfsclient.parallelism.max > -- > > Key: HIVE-15881 > URL: https://issues.apache.org/jira/browse/HIVE-15881 > Project: Hive > Issue Type: Task > Components: Query Planning >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Minor > Attachments: HIVE-15881.1.patch, HIVE-15881.2.patch > > > The Utilities class has two methods, {{getInputSummary}} and > {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} > to get the summary of a list of input locations in parallel. These methods > are Hive related, but the variable name does not look it is specific for Hive. > Also, the above variable is not on HiveConf nor used anywhere else. I just > found a reference on the Hadoop MR1 code. > I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, > and use a different variable name, such as > {{hive.get.input.listing.num.threads}}, that reflects the intention of the > variable. The removal of the old variable might happen on Hive 3.x -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max
[ https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-15881: --- Status: Patch Available (was: Open) > Use new thread count variable name instead of mapred.dfsclient.parallelism.max > -- > > Key: HIVE-15881 > URL: https://issues.apache.org/jira/browse/HIVE-15881 > Project: Hive > Issue Type: Task > Components: Query Planning >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Minor > Attachments: HIVE-15881.1.patch > > > The Utilities class has two methods, {{getInputSummary}} and > {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} > to get the summary of a list of input locations in parallel. These methods > are Hive related, but the variable name does not look it is specific for Hive. > Also, the above variable is not on HiveConf nor used anywhere else. I just > found a reference on the Hadoop MR1 code. > I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, > and use a different variable name, such as > {{hive.get.input.listing.num.threads}}, that reflects the intention of the > variable. The removal of the old variable might happen on Hive 3.x -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max
[ https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-15881: --- Attachment: HIVE-15881.1.patch > Use new thread count variable name instead of mapred.dfsclient.parallelism.max > -- > > Key: HIVE-15881 > URL: https://issues.apache.org/jira/browse/HIVE-15881 > Project: Hive > Issue Type: Task > Components: Query Planning >Reporter: Sergio Peña >Assignee: Sergio Peña >Priority: Minor > Attachments: HIVE-15881.1.patch > > > The Utilities class has two methods, {{getInputSummary}} and > {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} > to get the summary of a list of input locations in parallel. These methods > are Hive related, but the variable name does not look it is specific for Hive. > Also, the above variable is not on HiveConf nor used anywhere else. I just > found a reference on the Hadoop MR1 code. > I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, > and use a different variable name, such as > {{hive.get.input.listing.num.threads}}, that reflects the intention of the > variable. The removal of the old variable might happen on Hive 3.x -- This message was sent by Atlassian JIRA (v6.3.15#6346)