[jira] [Updated] (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2051: - Fix Version/s: 0.8.0 getInputSummary() to call FileSystem.getContentSummary() in parallel Key: HIVE-2051 URL: https://issues.apache.org/jira/browse/HIVE-2051 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, HIVE-2051.4.patch, HIVE-2051.5.patch getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely slow when the number of input paths are huge. By calling those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2051: -- Resolution: Fixed Status: Resolved (was: Patch Available) getInputSummary() to call FileSystem.getContentSummary() in parallel Key: HIVE-2051 URL: https://issues.apache.org/jira/browse/HIVE-2051 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, HIVE-2051.4.patch, HIVE-2051.5.patch getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely slow when the number of input paths are huge. By calling those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2051: -- Attachment: HIVE-2051.5.patch for IterruptedException, call Thread.currentThread().interrupt() and continue waiting. getInputSummary() to call FileSystem.getContentSummary() in parallel Key: HIVE-2051 URL: https://issues.apache.org/jira/browse/HIVE-2051 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, HIVE-2051.4.patch, HIVE-2051.5.patch getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely slow when the number of input paths are huge. By calling those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2051: -- Attachment: HIVE-2051.4.patch getInputSummary() to call FileSystem.getContentSummary() in parallel Key: HIVE-2051 URL: https://issues.apache.org/jira/browse/HIVE-2051 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, HIVE-2051.4.patch getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely slow when the number of input paths are huge. By calling those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2051: -- Attachment: HIVE-2051.2.patch Updates: 1. use ConcurrentHashMap 2. wait for Future objbect too 3. Share jobConf among threads 4. if user set mapred.dfsclient.parallelism.max to be 0 or 1, don't start new thread to execute it. 5. use Map.EntryK,V when iterating getInputSummary() to call FileSystem.getContentSummary() in parallel Key: HIVE-2051 URL: https://issues.apache.org/jira/browse/HIVE-2051 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely slow when the number of input paths are huge. By calling those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2051: -- Attachment: HIVE-2051.3.patch Minor: typo in comments. getInputSummary() to call FileSystem.getContentSummary() in parallel Key: HIVE-2051 URL: https://issues.apache.org/jira/browse/HIVE-2051 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely slow when the number of input paths are huge. By calling those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2051: -- Attachment: HIVE-2051.1.patch getInputSummary() to call FileSystem.getContentSummary() in parallel Key: HIVE-2051 URL: https://issues.apache.org/jira/browse/HIVE-2051 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2051.1.patch getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely slow when the number of input paths are huge. By calling those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel
[ https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2051: -- Status: Patch Available (was: Open) getInputSummary() to call FileSystem.getContentSummary() in parallel Key: HIVE-2051 URL: https://issues.apache.org/jira/browse/HIVE-2051 Project: Hive Issue Type: Improvement Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2051.1.patch getInputSummary() now call FileSystem.getContentSummary() one by one, which can be extremely slow when the number of input paths are huge. By calling those functions in parallel, we can cut latency in most cases. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira