[jira] [Updated] (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-07-26 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2051:
-

Fix Version/s: 0.8.0

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, 
 HIVE-2051.4.patch, HIVE-2051.5.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-04-19 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2051:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, 
 HIVE-2051.4.patch, HIVE-2051.5.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-20 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2051:
--

Attachment: HIVE-2051.5.patch

for IterruptedException, call Thread.currentThread().interrupt() and continue 
waiting. 

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, 
 HIVE-2051.4.patch, HIVE-2051.5.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-17 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2051:
--

Attachment: HIVE-2051.4.patch

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch, 
 HIVE-2051.4.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-14 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2051:
--

Attachment: HIVE-2051.2.patch

Updates:
1. use ConcurrentHashMap
2. wait for Future objbect too
3. Share jobConf among threads
4. if user set mapred.dfsclient.parallelism.max to be 0 or 1, don't start new 
thread to execute it.
5. use Map.EntryK,V when iterating

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-14 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2051:
--

Attachment: HIVE-2051.3.patch

Minor: typo in comments.

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-11 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2051:
--

Attachment: HIVE-2051.1.patch

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-11 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2051:
--

Status: Patch Available  (was: Open)

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira