[jira] [Updated] (HIVE-6613) Control when spcific Inputs / Outputs are started
[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6613: - Fix Version/s: 0.13.0 Control when spcific Inputs / Outputs are started - Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: 0.13.0 Attachments: HIVE-6613.2.txt, HIVE-6613.3.patch, TEZ-6613.1.txt When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6613) Control when spcific Inputs / Outputs are started
[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6613: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to branch-13 and trunk. Thanks [~sseth]! Control when spcific Inputs / Outputs are started - Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: 0.13.0 Attachments: HIVE-6613.2.txt, HIVE-6613.3.patch, TEZ-6613.1.txt When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6613) Control when spcific Inputs / Outputs are started
[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6613: - Status: Open (was: Patch Available) Control when spcific Inputs / Outputs are started - Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: HIVE-6613.2.txt, TEZ-6613.1.txt When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6613) Control when spcific Inputs / Outputs are started
[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-6613: - Attachment: HIVE-6613.3.patch Updated patch to include the missing file - and renamed to .patch for the pre-commit build. Control when spcific Inputs / Outputs are started - Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: HIVE-6613.2.txt, HIVE-6613.3.patch, TEZ-6613.1.txt When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6613) Control when spcific Inputs / Outputs are started
[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-6613: - Status: Patch Available (was: Open) Control when spcific Inputs / Outputs are started - Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: HIVE-6613.2.txt, HIVE-6613.3.patch, TEZ-6613.1.txt When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6613) Control when spcific Inputs / Outputs are started
[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-6613: - Status: Open (was: Patch Available) Control when spcific Inputs / Outputs are started - Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: TEZ-6613.1.txt When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6613) Control when spcific Inputs / Outputs are started
[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-6613: - Attachment: HIVE-6613.2.txt Updated patch. Changed cacheAccess to accept a configuration. Haven't changed the way Inputs are cached - since this gives a way to iterate over cached inputs, which may be useful at some point. Removed the LocalWork check. I'm not sure if a special check is required in case of a Bucketed Map Join. Control when spcific Inputs / Outputs are started - Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: HIVE-6613.2.txt, TEZ-6613.1.txt When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6613) Control when spcific Inputs / Outputs are started
[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-6613: - Status: Patch Available (was: Open) Control when spcific Inputs / Outputs are started - Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: HIVE-6613.2.txt, TEZ-6613.1.txt When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6613) Control when spcific Inputs / Outputs are started
[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-6613: - Attachment: TEZ-6613.1.txt Patch to make the changes mentioned in the description. Control when spcific Inputs / Outputs are started - Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: TEZ-6613.1.txt When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6613) Control when spcific Inputs / Outputs are started
[ https://issues.apache.org/jira/browse/HIVE-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-6613: - Status: Patch Available (was: Open) Control when spcific Inputs / Outputs are started - Key: HIVE-6613 URL: https://issues.apache.org/jira/browse/HIVE-6613 Project: Hive Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: TEZ-6613.1.txt When running with Tez - a couple of enhancement are possible 1) Avoid re-fetching data in case of MapJoins - since the data is likely to be cached after the first run (container re-use for the same query) 2) Start Outputs only after required Inputs are ready - specifically useful in case of Reduce - where shuffle requires a large memory, and the Output (if it's a sorted output) also requires a fair amount of memory. -- This message was sent by Atlassian JIRA (v6.2#6252)