[jira] [Commented] (MAPREDUCE-6659) Mapreduce App master waits long to kill containers on lost nodes.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210196#comment-15210196 ] Laxman commented on MAPREDUCE-6659: --- Please note that this issue happens with lost nodes (i.e, Unreachable hosts). NM crash with a reachable host is exhibiting a totally a different expected retry behavior. There liveness configurations are coming into play (yarn.resourcemanager.container.liveness-monitor.interval-ms, yarn.nm.liveness-monitor.expiry-interval-ms, yarn.am.liveness-monitor.expiry-interval-ms) as expected. > Mapreduce App master waits long to kill containers on lost nodes. > - > > Key: MAPREDUCE-6659 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6659 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.6.0 >Reporter: Laxman > > MR Application master waits for very long time to cleanup and relaunch the > tasks on lost nodes. Wait time is actually 2.5 hours > (ipc.client.connect.max.retries * ipc.client.connect.max.retries.on.timeouts > * ipc.client.connect.timeout = 10 * 45 * 20 = 9000 seconds = 2.5 hours) > Some similar issue related in RM-AM rpc protocol is fixed in YARN-3809. > As fixed in YARN-3809, we may need to introduce new configurations to control > this RPC retry behavior. > Also, I feel this total retry time should honor and capped maximum to global > task time out (mapreduce.task.timeout = 60 default) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6659) Mapreduce App master waits long to kill containers on lost nodes.
Laxman created MAPREDUCE-6659: - Summary: Mapreduce App master waits long to kill containers on lost nodes. Key: MAPREDUCE-6659 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6659 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.6.0 Reporter: Laxman MR Application master waits for very long time to cleanup and relaunch the tasks on lost nodes. Wait time is actually 2.5 hours (ipc.client.connect.max.retries * ipc.client.connect.max.retries.on.timeouts * ipc.client.connect.timeout = 10 * 45 * 20 = 9000 seconds = 2.5 hours) Some similar issue related in RM-AM rpc protocol is fixed in YARN-3809. As fixed in YARN-3809, we may need to introduce new configurations to control this RPC retry behavior. Also, I feel this total retry time should honor and capped maximum to global task time out (mapreduce.task.timeout = 60 default) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-6351) Reducer hung in copy phase.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman resolved MAPREDUCE-6351. --- Resolution: Duplicate Assignee: Laxman > Reducer hung in copy phase. > --- > > Key: MAPREDUCE-6351 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.6.0 >Reporter: Laxman >Assignee: Laxman > Attachments: jstat-gc.log, reducer-container-partial.log.zip, > thread-dumps.out > > > *Problem* > Reducer gets stuck in copy phase and doesn't make progress for very long > time. After killing this task for couple of times manually, it gets > completed. > *Observations* > - Verfied gc logs. Found no memory related issues. Attached the logs. > - Verified thread dumps. Found no thread related problems. > - On verification of logs, fetcher threads are not copying the map outputs > and they are just waiting for merge to happen. > - Merge thread is alive and in wait state. > *Analysis* > On careful observation of logs, thread dumps and code, this looks to me like > a classic case of multi-threading issue. Thread goes to wait state after it > has been notified. > Here is the suspect code flow. > *Thread #1* > Fetcher thread - notification comes first > org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(Set) > {code} > synchronized(pendingToBeMerged) { > pendingToBeMerged.addLast(toMergeInputs); > pendingToBeMerged.notifyAll(); > } > {code} > *Thread #2* > Merge Thread - goes to wait state (Notification goes unconsumed) > org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() > {code} > synchronized (pendingToBeMerged) { > while(pendingToBeMerged.size() <= 0) { > pendingToBeMerged.wait(); > } > // Pickup the inputs to merge. > inputs = pendingToBeMerged.removeFirst(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6351) Reducer hung in copy phase.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528154#comment-14528154 ] Laxman commented on MAPREDUCE-6351: --- Thanks a lot Jason for details. We are hitting exactly same scenario (disk bad) as explained in MAPREDUCE-6334. We will try the patch and update the details in this jira. Reducer hung in copy phase. --- Key: MAPREDUCE-6351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Reporter: Laxman Attachments: jstat-gc.log, reducer-container-partial.log.zip, thread-dumps.out *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6351) Reducer hung in copy phase.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated MAPREDUCE-6351: -- Description: *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. {deleted} *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} {deleted} was: *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} Reducer hung in copy phase. --- Key: MAPREDUCE-6351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Reporter: Laxman Attachments: jstat-gc.log, reducer-container-partial.log.zip, thread-dumps.out *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. {deleted} *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} {deleted} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6351) Reducer hung in copy phase.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated MAPREDUCE-6351: -- Description: *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. - *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code}- was: *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. {deleted} *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} {deleted} Reducer hung in copy phase. --- Key: MAPREDUCE-6351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Reporter: Laxman Attachments: jstat-gc.log, reducer-container-partial.log.zip, thread-dumps.out *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. - *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code}- -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6351) Reducer hung in copy phase.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated MAPREDUCE-6351: -- Description: *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} was: *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. - *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code}- Reducer hung in copy phase. --- Key: MAPREDUCE-6351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Reporter: Laxman Attachments: jstat-gc.log, reducer-container-partial.log.zip, thread-dumps.out *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6351) Reducer hung in copy phase.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526438#comment-14526438 ] Laxman commented on MAPREDUCE-6351: --- Threads analysis mentioned in description above found to be incorrect when I retrace the code flow. Pre-notification is not a problem as merger wait is guarded by size check. However, problem exists, fetchers are not proceeding and waiting for merger to free some memory and merge doing nothing. Reducer hung in copy phase. --- Key: MAPREDUCE-6351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Reporter: Laxman Attachments: jstat-gc.log, reducer-container-partial.log.zip, thread-dumps.out *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6351) Reducer hung in copy phase.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated MAPREDUCE-6351: -- Attachment: thread-dumps.out reducer-container-partial.log.zip jstat-gc.log Attached the logs (container log, thread dumps, jstat output) for reference. Please note that, my thoughts on threading issue may be premature and incorrect. Irrespective of this analysis problem exists. Reducer hung in copy phase. --- Key: MAPREDUCE-6351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Reporter: Laxman Attachments: jstat-gc.log, reducer-container-partial.log.zip, thread-dumps.out *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Analysis* - Verfied gc logs. Found no memory related issues. Attache - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6351) Reducer hung in copy phase.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laxman updated MAPREDUCE-6351: -- Description: *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} was: *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Analysis* - Verfied gc logs. Found no memory related issues. Attache - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} Reducer hung in copy phase. --- Key: MAPREDUCE-6351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Reporter: Laxman Attachments: jstat-gc.log, reducer-container-partial.log.zip, thread-dumps.out *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Observations* - Verfied gc logs. Found no memory related issues. Attached the logs. - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. *Analysis* On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6351) Reducer hung in copy phase.
Laxman created MAPREDUCE-6351: - Summary: Reducer hung in copy phase. Key: MAPREDUCE-6351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Reporter: Laxman *Problem* Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed. *Analysis* - Verfied gc logs. Found no memory related issues. Attache - Verified thread dumps. Found no thread related problems. - On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen. - Merge thread is alive and in wait state. On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified. Here is the suspect code flow. *Thread #1* Fetcher thread - notification comes first org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT) {code} synchronized(pendingToBeMerged) { pendingToBeMerged.addLast(toMergeInputs); pendingToBeMerged.notifyAll(); } {code} *Thread #2* Merge Thread - goes to wait state (Notification goes unconsumed) org.apache.hadoop.mapreduce.task.reduce.MergeThread.run() {code} synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() = 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528914#comment-13528914 ] Laxman commented on MAPREDUCE-4049: --- With this patch, we are able to meet our goals (plugin custom shuffle algorithm Network Levitate Merge) without any issues. Thanks Avner for keeping the patch small and crisp. plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 3.0.0 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504558#comment-13504558 ] Laxman commented on MAPREDUCE-4049: --- bq. You are warmly welcomed to contribute to push the algorithms of this plugin to the core of vanilla Hadoop Thank you Avner. I wish to see this as part of hadoop. I'm not able to build UDA you have provided as per BUILD.README provided in the downloaded bundle. SVN repository provided is not accessible/resolvable. https://sirius.voltaire.com/repos/enterprise/uda/trunk bq. as well as to help accepting my straight forward patch in this JIRA issue. I will personally request few of my friends (Hadoop contributors) to review this jira. plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: trunk Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504560#comment-13504560 ] Laxman commented on MAPREDUCE-4049: --- I'm trying to build as per the README available here (http://mellanox.com/downloads/UDA/UDA3.0_Release.tar.gz). plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: trunk Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable
[ https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503966#comment-13503966 ] Laxman commented on MAPREDUCE-4807: --- Sorry for the repeated noise. It's my mistake. Allow MapOutputBuffer to be pluggable - Key: MAPREDUCE-4807 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 2.0.2-alpha Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 2.0.3-alpha Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch Allow MapOutputBuffer to be pluggable -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable
[ https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503494#comment-13503494 ] Laxman commented on MAPREDUCE-4807: --- {code} - private class MapOutputBufferK extends Object, V extends Object + @InterfaceAudience.LimitedPrivate({MapReduce}) + @InterfaceStability.Unstable + public static class MapOutputBufferK extends Object, V extends Object {code} I applied this patch. Some compilation issues due to above snippet (non-static to static class). {color:red}Cannot make a static reference to the non-static method getTaskID() from the type Task{color} Wondering how this patch got through QA bot. Allow MapOutputBuffer to be pluggable - Key: MAPREDUCE-4807 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 2.0.2-alpha Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 2.0.3-alpha Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch Allow MapOutputBuffer to be pluggable -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable
[ https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503529#comment-13503529 ] Laxman commented on MAPREDUCE-4807: --- Asokan, I applied the patches in correct order as you mentioned in MAPREDUCE-4808. Even in combo patch attached I can see the above mentioned snippet(non-static to static) as problem. Allow MapOutputBuffer to be pluggable - Key: MAPREDUCE-4807 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 2.0.2-alpha Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 2.0.3-alpha Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch Allow MapOutputBuffer to be pluggable -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable
[ https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503620#comment-13503620 ] Laxman commented on MAPREDUCE-4807: --- I manually merged your patch to branch-2. But that should not be a cause of this problem. I had just gone through trunk code also. {code} - private class MapOutputBufferK extends Object, V extends Object + @InterfaceAudience.LimitedPrivate({MapReduce}) + @InterfaceStability.Unstable + public static class MapOutputBufferK extends Object, V extends Object {code} Please consider the following leaving aside the problems w.r.to applying patch or wrong branch. * MapOutputBuffer is a static inner class. * getTaskID() is a inherited method from Task (non-static) * In patch, we are referring to non-static method(getTaskID()) from static context (public static class MapOutputBuffer) This is a trivial problem. Isn't it? Allow MapOutputBuffer to be pluggable - Key: MAPREDUCE-4807 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 2.0.2-alpha Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 2.0.3-alpha Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch Allow MapOutputBuffer to be pluggable -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503362#comment-13503362 ] Laxman commented on MAPREDUCE-4049: --- Hi Avner, we are also impressed by Network Levitated Merge algorithm and we are in the process of implementing this. With your consent, we would like to collaborate and contribute to this issue. plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: trunk Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503373#comment-13503373 ] Laxman commented on MAPREDUCE-2454: --- Asokan Alejandro, iiuc the objective/goal of this feature is to make the Shuffler Merger pluggable. So that, mapreduce user can plugin better algorithms. We are in the process of implementing the Network Levitated Merge algorithm [ http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf ]. With this, we wanted to avoid the disk usage on reducer side completely and directly shuffle the map outputs to the reducer. IMO, with MAPREDUCE-2454 and other sub-tasks as well, we may *not be able to plugin* the above algorithm. Request you to provide your suggestions on this. Allow external sorter plugin for MR --- Key: MAPREDUCE-2454 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha Reporter: Mariappan Asokan Assignee: Mariappan Asokan Priority: Minor Labels: features, performance, plugin, sort Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, mapreduce-2454-modified-code.patch, mapreduce-2454-modified-test.patch, mapreduce-2454-new-test.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454-protection-change.patch, mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java Define interfaces and some abstract classes in the Hadoop framework to facilitate external sorter plugins both on the Map and Reduce sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503416#comment-13503416 ] Laxman commented on MAPREDUCE-2454: --- Asokan, thank you very much for your quick response and detailed clarification. I will give a try with the patches available here. I will get back to you soon on this. Allow external sorter plugin for MR --- Key: MAPREDUCE-2454 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha Reporter: Mariappan Asokan Assignee: Mariappan Asokan Priority: Minor Labels: features, performance, plugin, sort Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, mapreduce-2454-modified-code.patch, mapreduce-2454-modified-test.patch, mapreduce-2454-new-test.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454-protection-change.patch, mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java Define interfaces and some abstract classes in the Hadoop framework to facilitate external sorter plugins both on the Map and Reduce sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-2243) Close all the file streams propely in a finally block to avoid their leakage.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982466#action_12982466 ] Laxman commented on MAPREDUCE-2243: --- @Owen bq. In most cases, the exceptions outside of IOException don't matter much because they will bring down. bq. this leaves the nominal case simple. Note that this is the worst case, if we get an Error every system in Hadoop should shutdown. bq. There is no point in continuing and worrying about lost file handles at that point is too extreme. Yes, I agree to your point in *Error* scenarios. How about some runtime exception which need not be handled in the positive flow? Handling unexpected generic exceptions and errors will result in catch and rethrow pattern. So, I prefer to handle the stream closure in try block as well as in finally block. As per your initial comments Kamesh has corrected to close the streams in try block as well as in finally block. Do you still see some issue with this approach? How handling stream close in catch block is better than handling in try and finally blocks? My opinion on this issue is Handling stream closures in try and finally block is fool proof and it will avoid some code duplication. Close all the file streams propely in a finally block to avoid their leakage. - Key: MAPREDUCE-2243 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2243 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker, tasktracker Affects Versions: 0.20.1, 0.22.0 Environment: NA Reporter: Bhallamudi Venkata Siva Kamesh Priority: Minor Original Estimate: 72h Remaining Estimate: 72h In the following classes streams should be closed in finally block to avoid their leakage in the exceptional cases. CompletedJobStatusStore.java -- dataOut.writeInt(events.length); for (TaskCompletionEvent event : events) { event.write(dataOut); } dataOut.close() ; EventWriter.java -- encoder.flush(); out.close(); MapTask.java --- splitMetaInfo.write(out); out.close(); TaskLog 1) str = fis.readLine(); fis.close(); 2) dos.writeBytes(Long.toString(new File(logLocation, LogName.SYSLOG .toString()).length() - prevLogLength) + \n); dos.close(); TotalOrderPartitioner.java --- while (reader.next(key, value)) { parts.add(key); key = ReflectionUtils.newInstance(keyClass, conf); } reader.close(); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.