[jira] [Commented] (MAPREDUCE-6659) Mapreduce App master waits long to kill containers on lost nodes.

2016-03-24 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210196#comment-15210196
 ] 

Laxman commented on MAPREDUCE-6659:
---

Please note that this issue happens with lost nodes (i.e, Unreachable hosts). 
NM crash with a reachable host is exhibiting a totally a different expected 
retry behavior. There liveness configurations are coming into play 
(yarn.resourcemanager.container.liveness-monitor.interval-ms, 
yarn.nm.liveness-monitor.expiry-interval-ms, 
yarn.am.liveness-monitor.expiry-interval-ms) as expected.

> Mapreduce App master waits long to kill containers on lost nodes.
> -
>
> Key: MAPREDUCE-6659
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6659
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.6.0
>Reporter: Laxman
>
> MR Application master waits for very long time to cleanup and relaunch the 
> tasks on lost nodes. Wait time is actually 2.5 hours 
> (ipc.client.connect.max.retries * ipc.client.connect.max.retries.on.timeouts 
> * ipc.client.connect.timeout = 10 * 45 * 20 = 9000 seconds = 2.5 hours)
> Some similar issue related in RM-AM rpc protocol is fixed in YARN-3809.
> As fixed in YARN-3809, we may need to introduce new configurations to control 
> this RPC retry behavior.
> Also, I feel this total retry time should honor and capped maximum to global 
> task time out (mapreduce.task.timeout = 60 default)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6659) Mapreduce App master waits long to kill containers on lost nodes.

2016-03-24 Thread Laxman (JIRA)
Laxman created MAPREDUCE-6659:
-

 Summary: Mapreduce App master waits long to kill containers on 
lost nodes.
 Key: MAPREDUCE-6659
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6659
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.6.0
Reporter: Laxman


MR Application master waits for very long time to cleanup and relaunch the 
tasks on lost nodes. Wait time is actually 2.5 hours 
(ipc.client.connect.max.retries * ipc.client.connect.max.retries.on.timeouts * 
ipc.client.connect.timeout = 10 * 45 * 20 = 9000 seconds = 2.5 hours)

Some similar issue related in RM-AM rpc protocol is fixed in YARN-3809.
As fixed in YARN-3809, we may need to introduce new configurations to control 
this RPC retry behavior.

Also, I feel this total retry time should honor and capped maximum to global 
task time out (mapreduce.task.timeout = 60 default)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAPREDUCE-6351) Reducer hung in copy phase.

2015-09-16 Thread Laxman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman resolved MAPREDUCE-6351.
---
Resolution: Duplicate
  Assignee: Laxman

> Reducer hung in copy phase.
> ---
>
> Key: MAPREDUCE-6351
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Laxman
>Assignee: Laxman
> Attachments: jstat-gc.log, reducer-container-partial.log.zip, 
> thread-dumps.out
>
>
> *Problem*
> Reducer gets stuck in copy phase and doesn't make progress for very long 
> time. After killing this task for couple of times manually, it gets 
> completed. 
> *Observations*
> - Verfied gc logs. Found no memory related issues. Attached the logs.
> - Verified thread dumps. Found no thread related problems. 
> - On verification of logs, fetcher threads are not copying the map outputs 
> and they are just waiting for merge to happen.
> - Merge thread is alive and in wait state.
> *Analysis* 
> On careful observation of logs, thread dumps and code, this looks to me like 
> a classic case of multi-threading issue. Thread goes to wait state after it 
> has been notified. 
> Here is the suspect code flow.
> *Thread #1*
> Fetcher thread - notification comes first
> org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(Set)
> {code}
>   synchronized(pendingToBeMerged) {
> pendingToBeMerged.addLast(toMergeInputs);
> pendingToBeMerged.notifyAll();
>   }
> {code}
> *Thread #2*
> Merge Thread - goes to wait state (Notification goes unconsumed)
> org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
> {code}
> synchronized (pendingToBeMerged) {
>   while(pendingToBeMerged.size() <= 0) {
> pendingToBeMerged.wait();
>   }
>   // Pickup the inputs to merge.
>   inputs = pendingToBeMerged.removeFirst();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6351) Reducer hung in copy phase.

2015-05-05 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528154#comment-14528154
 ] 

Laxman commented on MAPREDUCE-6351:
---

Thanks a lot Jason for details. We are hitting exactly same scenario (disk bad) 
as explained in MAPREDUCE-6334.
We will try the patch and update the details in this jira.



 Reducer hung in copy phase.
 ---

 Key: MAPREDUCE-6351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Laxman
 Attachments: jstat-gc.log, reducer-container-partial.log.zip, 
 thread-dumps.out


 *Problem*
 Reducer gets stuck in copy phase and doesn't make progress for very long 
 time. After killing this task for couple of times manually, it gets 
 completed. 
 *Observations*
 - Verfied gc logs. Found no memory related issues. Attached the logs.
 - Verified thread dumps. Found no thread related problems. 
 - On verification of logs, fetcher threads are not copying the map outputs 
 and they are just waiting for merge to happen.
 - Merge thread is alive and in wait state.
 *Analysis* 
 On careful observation of logs, thread dumps and code, this looks to me like 
 a classic case of multi-threading issue. Thread goes to wait state after it 
 has been notified. 
 Here is the suspect code flow.
 *Thread #1*
 Fetcher thread - notification comes first
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
 {code}
   synchronized(pendingToBeMerged) {
 pendingToBeMerged.addLast(toMergeInputs);
 pendingToBeMerged.notifyAll();
   }
 {code}
 *Thread #2*
 Merge Thread - goes to wait state (Notification goes unconsumed)
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
 {code}
 synchronized (pendingToBeMerged) {
   while(pendingToBeMerged.size() = 0) {
 pendingToBeMerged.wait();
   }
   // Pickup the inputs to merge.
   inputs = pendingToBeMerged.removeFirst();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6351) Reducer hung in copy phase.

2015-05-04 Thread Laxman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman updated MAPREDUCE-6351:
--
Description: 
*Problem*
Reducer gets stuck in copy phase and doesn't make progress for very long time. 
After killing this task for couple of times manually, it gets completed. 

*Observations*
- Verfied gc logs. Found no memory related issues. Attached the logs.
- Verified thread dumps. Found no thread related problems. 
- On verification of logs, fetcher threads are not copying the map outputs and 
they are just waiting for merge to happen.
- Merge thread is alive and in wait state.
{deleted}
*Analysis* 
On careful observation of logs, thread dumps and code, this looks to me like a 
classic case of multi-threading issue. Thread goes to wait state after it has 
been notified. 

Here is the suspect code flow.
*Thread #1*
Fetcher thread - notification comes first
org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
{code}
  synchronized(pendingToBeMerged) {
pendingToBeMerged.addLast(toMergeInputs);
pendingToBeMerged.notifyAll();
  }
{code}

*Thread #2*
Merge Thread - goes to wait state (Notification goes unconsumed)
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
{code}
synchronized (pendingToBeMerged) {
  while(pendingToBeMerged.size() = 0) {
pendingToBeMerged.wait();
  }
  // Pickup the inputs to merge.
  inputs = pendingToBeMerged.removeFirst();
}
{code}
{deleted}

  was:
*Problem*
Reducer gets stuck in copy phase and doesn't make progress for very long time. 
After killing this task for couple of times manually, it gets completed. 

*Observations*
- Verfied gc logs. Found no memory related issues. Attached the logs.
- Verified thread dumps. Found no thread related problems. 
- On verification of logs, fetcher threads are not copying the map outputs and 
they are just waiting for merge to happen.
- Merge thread is alive and in wait state.

*Analysis* 
On careful observation of logs, thread dumps and code, this looks to me like a 
classic case of multi-threading issue. Thread goes to wait state after it has 
been notified. 

Here is the suspect code flow.
*Thread #1*
Fetcher thread - notification comes first
org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
{code}
  synchronized(pendingToBeMerged) {
pendingToBeMerged.addLast(toMergeInputs);
pendingToBeMerged.notifyAll();
  }
{code}

*Thread #2*
Merge Thread - goes to wait state (Notification goes unconsumed)
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
{code}
synchronized (pendingToBeMerged) {
  while(pendingToBeMerged.size() = 0) {
pendingToBeMerged.wait();
  }
  // Pickup the inputs to merge.
  inputs = pendingToBeMerged.removeFirst();
}
{code}



 Reducer hung in copy phase.
 ---

 Key: MAPREDUCE-6351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Laxman
 Attachments: jstat-gc.log, reducer-container-partial.log.zip, 
 thread-dumps.out


 *Problem*
 Reducer gets stuck in copy phase and doesn't make progress for very long 
 time. After killing this task for couple of times manually, it gets 
 completed. 
 *Observations*
 - Verfied gc logs. Found no memory related issues. Attached the logs.
 - Verified thread dumps. Found no thread related problems. 
 - On verification of logs, fetcher threads are not copying the map outputs 
 and they are just waiting for merge to happen.
 - Merge thread is alive and in wait state.
 {deleted}
 *Analysis* 
 On careful observation of logs, thread dumps and code, this looks to me like 
 a classic case of multi-threading issue. Thread goes to wait state after it 
 has been notified. 
 Here is the suspect code flow.
 *Thread #1*
 Fetcher thread - notification comes first
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
 {code}
   synchronized(pendingToBeMerged) {
 pendingToBeMerged.addLast(toMergeInputs);
 pendingToBeMerged.notifyAll();
   }
 {code}
 *Thread #2*
 Merge Thread - goes to wait state (Notification goes unconsumed)
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
 {code}
 synchronized (pendingToBeMerged) {
   while(pendingToBeMerged.size() = 0) {
 pendingToBeMerged.wait();
   }
   // Pickup the inputs to merge.
   inputs = pendingToBeMerged.removeFirst();
 }
 {code}
 {deleted}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6351) Reducer hung in copy phase.

2015-05-04 Thread Laxman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman updated MAPREDUCE-6351:
--
Description: 
*Problem*
Reducer gets stuck in copy phase and doesn't make progress for very long time. 
After killing this task for couple of times manually, it gets completed. 

*Observations*
- Verfied gc logs. Found no memory related issues. Attached the logs.
- Verified thread dumps. Found no thread related problems. 
- On verification of logs, fetcher threads are not copying the map outputs and 
they are just waiting for merge to happen.
- Merge thread is alive and in wait state.

-
*Analysis* 
On careful observation of logs, thread dumps and code, this looks to me like a 
classic case of multi-threading issue. Thread goes to wait state after it has 
been notified. 

Here is the suspect code flow.
*Thread #1*
Fetcher thread - notification comes first
org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
{code}
  synchronized(pendingToBeMerged) {
pendingToBeMerged.addLast(toMergeInputs);
pendingToBeMerged.notifyAll();
  }
{code}

*Thread #2*
Merge Thread - goes to wait state (Notification goes unconsumed)
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
{code}
synchronized (pendingToBeMerged) {
  while(pendingToBeMerged.size() = 0) {
pendingToBeMerged.wait();
  }
  // Pickup the inputs to merge.
  inputs = pendingToBeMerged.removeFirst();
}
{code}-

  was:
*Problem*
Reducer gets stuck in copy phase and doesn't make progress for very long time. 
After killing this task for couple of times manually, it gets completed. 

*Observations*
- Verfied gc logs. Found no memory related issues. Attached the logs.
- Verified thread dumps. Found no thread related problems. 
- On verification of logs, fetcher threads are not copying the map outputs and 
they are just waiting for merge to happen.
- Merge thread is alive and in wait state.
{deleted}
*Analysis* 
On careful observation of logs, thread dumps and code, this looks to me like a 
classic case of multi-threading issue. Thread goes to wait state after it has 
been notified. 

Here is the suspect code flow.
*Thread #1*
Fetcher thread - notification comes first
org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
{code}
  synchronized(pendingToBeMerged) {
pendingToBeMerged.addLast(toMergeInputs);
pendingToBeMerged.notifyAll();
  }
{code}

*Thread #2*
Merge Thread - goes to wait state (Notification goes unconsumed)
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
{code}
synchronized (pendingToBeMerged) {
  while(pendingToBeMerged.size() = 0) {
pendingToBeMerged.wait();
  }
  // Pickup the inputs to merge.
  inputs = pendingToBeMerged.removeFirst();
}
{code}
{deleted}


 Reducer hung in copy phase.
 ---

 Key: MAPREDUCE-6351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Laxman
 Attachments: jstat-gc.log, reducer-container-partial.log.zip, 
 thread-dumps.out


 *Problem*
 Reducer gets stuck in copy phase and doesn't make progress for very long 
 time. After killing this task for couple of times manually, it gets 
 completed. 
 *Observations*
 - Verfied gc logs. Found no memory related issues. Attached the logs.
 - Verified thread dumps. Found no thread related problems. 
 - On verification of logs, fetcher threads are not copying the map outputs 
 and they are just waiting for merge to happen.
 - Merge thread is alive and in wait state.
 -
 *Analysis* 
 On careful observation of logs, thread dumps and code, this looks to me like 
 a classic case of multi-threading issue. Thread goes to wait state after it 
 has been notified. 
 Here is the suspect code flow.
 *Thread #1*
 Fetcher thread - notification comes first
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
 {code}
   synchronized(pendingToBeMerged) {
 pendingToBeMerged.addLast(toMergeInputs);
 pendingToBeMerged.notifyAll();
   }
 {code}
 *Thread #2*
 Merge Thread - goes to wait state (Notification goes unconsumed)
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
 {code}
 synchronized (pendingToBeMerged) {
   while(pendingToBeMerged.size() = 0) {
 pendingToBeMerged.wait();
   }
   // Pickup the inputs to merge.
   inputs = pendingToBeMerged.removeFirst();
 }
 {code}-



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6351) Reducer hung in copy phase.

2015-05-04 Thread Laxman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman updated MAPREDUCE-6351:
--
Description: 
*Problem*
Reducer gets stuck in copy phase and doesn't make progress for very long time. 
After killing this task for couple of times manually, it gets completed. 

*Observations*
- Verfied gc logs. Found no memory related issues. Attached the logs.
- Verified thread dumps. Found no thread related problems. 
- On verification of logs, fetcher threads are not copying the map outputs and 
they are just waiting for merge to happen.
- Merge thread is alive and in wait state.


*Analysis* 
On careful observation of logs, thread dumps and code, this looks to me like a 
classic case of multi-threading issue. Thread goes to wait state after it has 
been notified. 

Here is the suspect code flow.
*Thread #1*
Fetcher thread - notification comes first
org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
{code}
  synchronized(pendingToBeMerged) {
pendingToBeMerged.addLast(toMergeInputs);
pendingToBeMerged.notifyAll();
  }
{code}

*Thread #2*
Merge Thread - goes to wait state (Notification goes unconsumed)
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
{code}
synchronized (pendingToBeMerged) {
  while(pendingToBeMerged.size() = 0) {
pendingToBeMerged.wait();
  }
  // Pickup the inputs to merge.
  inputs = pendingToBeMerged.removeFirst();
}
{code}

  was:
*Problem*
Reducer gets stuck in copy phase and doesn't make progress for very long time. 
After killing this task for couple of times manually, it gets completed. 

*Observations*
- Verfied gc logs. Found no memory related issues. Attached the logs.
- Verified thread dumps. Found no thread related problems. 
- On verification of logs, fetcher threads are not copying the map outputs and 
they are just waiting for merge to happen.
- Merge thread is alive and in wait state.

-
*Analysis* 
On careful observation of logs, thread dumps and code, this looks to me like a 
classic case of multi-threading issue. Thread goes to wait state after it has 
been notified. 

Here is the suspect code flow.
*Thread #1*
Fetcher thread - notification comes first
org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
{code}
  synchronized(pendingToBeMerged) {
pendingToBeMerged.addLast(toMergeInputs);
pendingToBeMerged.notifyAll();
  }
{code}

*Thread #2*
Merge Thread - goes to wait state (Notification goes unconsumed)
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
{code}
synchronized (pendingToBeMerged) {
  while(pendingToBeMerged.size() = 0) {
pendingToBeMerged.wait();
  }
  // Pickup the inputs to merge.
  inputs = pendingToBeMerged.removeFirst();
}
{code}-


 Reducer hung in copy phase.
 ---

 Key: MAPREDUCE-6351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Laxman
 Attachments: jstat-gc.log, reducer-container-partial.log.zip, 
 thread-dumps.out


 *Problem*
 Reducer gets stuck in copy phase and doesn't make progress for very long 
 time. After killing this task for couple of times manually, it gets 
 completed. 
 *Observations*
 - Verfied gc logs. Found no memory related issues. Attached the logs.
 - Verified thread dumps. Found no thread related problems. 
 - On verification of logs, fetcher threads are not copying the map outputs 
 and they are just waiting for merge to happen.
 - Merge thread is alive and in wait state.
 *Analysis* 
 On careful observation of logs, thread dumps and code, this looks to me like 
 a classic case of multi-threading issue. Thread goes to wait state after it 
 has been notified. 
 Here is the suspect code flow.
 *Thread #1*
 Fetcher thread - notification comes first
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
 {code}
   synchronized(pendingToBeMerged) {
 pendingToBeMerged.addLast(toMergeInputs);
 pendingToBeMerged.notifyAll();
   }
 {code}
 *Thread #2*
 Merge Thread - goes to wait state (Notification goes unconsumed)
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
 {code}
 synchronized (pendingToBeMerged) {
   while(pendingToBeMerged.size() = 0) {
 pendingToBeMerged.wait();
   }
   // Pickup the inputs to merge.
   inputs = pendingToBeMerged.removeFirst();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6351) Reducer hung in copy phase.

2015-05-04 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526438#comment-14526438
 ] 

Laxman commented on MAPREDUCE-6351:
---

Threads analysis mentioned in description above found to be incorrect when I 
retrace the code flow. Pre-notification is not a problem as merger wait is 
guarded by size check.

However, problem exists, fetchers are not proceeding and waiting for merger to 
free some memory and merge doing nothing.

 Reducer hung in copy phase.
 ---

 Key: MAPREDUCE-6351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Laxman
 Attachments: jstat-gc.log, reducer-container-partial.log.zip, 
 thread-dumps.out


 *Problem*
 Reducer gets stuck in copy phase and doesn't make progress for very long 
 time. After killing this task for couple of times manually, it gets 
 completed. 
 *Observations*
 - Verfied gc logs. Found no memory related issues. Attached the logs.
 - Verified thread dumps. Found no thread related problems. 
 - On verification of logs, fetcher threads are not copying the map outputs 
 and they are just waiting for merge to happen.
 - Merge thread is alive and in wait state.
 *Analysis* 
 On careful observation of logs, thread dumps and code, this looks to me like 
 a classic case of multi-threading issue. Thread goes to wait state after it 
 has been notified. 
 Here is the suspect code flow.
 *Thread #1*
 Fetcher thread - notification comes first
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
 {code}
   synchronized(pendingToBeMerged) {
 pendingToBeMerged.addLast(toMergeInputs);
 pendingToBeMerged.notifyAll();
   }
 {code}
 *Thread #2*
 Merge Thread - goes to wait state (Notification goes unconsumed)
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
 {code}
 synchronized (pendingToBeMerged) {
   while(pendingToBeMerged.size() = 0) {
 pendingToBeMerged.wait();
   }
   // Pickup the inputs to merge.
   inputs = pendingToBeMerged.removeFirst();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6351) Reducer hung in copy phase.

2015-05-02 Thread Laxman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman updated MAPREDUCE-6351:
--
Attachment: thread-dumps.out
reducer-container-partial.log.zip
jstat-gc.log

Attached the logs (container log, thread dumps, jstat output) for reference.

Please note that, my thoughts on threading issue may be premature and 
incorrect. Irrespective of this analysis problem exists.

 Reducer hung in copy phase.
 ---

 Key: MAPREDUCE-6351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Laxman
 Attachments: jstat-gc.log, reducer-container-partial.log.zip, 
 thread-dumps.out


 *Problem*
 Reducer gets stuck in copy phase and doesn't make progress for very long 
 time. After killing this task for couple of times manually, it gets 
 completed. 
 *Analysis*
 - Verfied gc logs. Found no memory related issues. Attache
 - Verified thread dumps. Found no thread related problems. 
 - On verification of logs, fetcher threads are not copying the map outputs 
 and they are just waiting for merge to happen.
 - Merge thread is alive and in wait state.
 On careful observation of logs, thread dumps and code, this looks to me like 
 a classic case of multi-threading issue. Thread goes to wait state after it 
 has been notified. 
 Here is the suspect code flow.
 *Thread #1*
 Fetcher thread - notification comes first
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
 {code}
   synchronized(pendingToBeMerged) {
 pendingToBeMerged.addLast(toMergeInputs);
 pendingToBeMerged.notifyAll();
   }
 {code}
 *Thread #2*
 Merge Thread - goes to wait state (Notification goes unconsumed)
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
 {code}
 synchronized (pendingToBeMerged) {
   while(pendingToBeMerged.size() = 0) {
 pendingToBeMerged.wait();
   }
   // Pickup the inputs to merge.
   inputs = pendingToBeMerged.removeFirst();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6351) Reducer hung in copy phase.

2015-05-02 Thread Laxman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laxman updated MAPREDUCE-6351:
--
Description: 
*Problem*
Reducer gets stuck in copy phase and doesn't make progress for very long time. 
After killing this task for couple of times manually, it gets completed. 

*Observations*
- Verfied gc logs. Found no memory related issues. Attached the logs.
- Verified thread dumps. Found no thread related problems. 
- On verification of logs, fetcher threads are not copying the map outputs and 
they are just waiting for merge to happen.
- Merge thread is alive and in wait state.

*Analysis* 
On careful observation of logs, thread dumps and code, this looks to me like a 
classic case of multi-threading issue. Thread goes to wait state after it has 
been notified. 

Here is the suspect code flow.
*Thread #1*
Fetcher thread - notification comes first
org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
{code}
  synchronized(pendingToBeMerged) {
pendingToBeMerged.addLast(toMergeInputs);
pendingToBeMerged.notifyAll();
  }
{code}

*Thread #2*
Merge Thread - goes to wait state (Notification goes unconsumed)
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
{code}
synchronized (pendingToBeMerged) {
  while(pendingToBeMerged.size() = 0) {
pendingToBeMerged.wait();
  }
  // Pickup the inputs to merge.
  inputs = pendingToBeMerged.removeFirst();
}
{code}


  was:
*Problem*
Reducer gets stuck in copy phase and doesn't make progress for very long time. 
After killing this task for couple of times manually, it gets completed. 

*Analysis*
- Verfied gc logs. Found no memory related issues. Attache
- Verified thread dumps. Found no thread related problems. 
- On verification of logs, fetcher threads are not copying the map outputs and 
they are just waiting for merge to happen.
- Merge thread is alive and in wait state.

On careful observation of logs, thread dumps and code, this looks to me like a 
classic case of multi-threading issue. Thread goes to wait state after it has 
been notified. 

Here is the suspect code flow.

*Thread #1*
Fetcher thread - notification comes first
org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
{code}
  synchronized(pendingToBeMerged) {
pendingToBeMerged.addLast(toMergeInputs);
pendingToBeMerged.notifyAll();
  }
{code}

*Thread #2*
Merge Thread - goes to wait state (Notification goes unconsumed)
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
{code}
synchronized (pendingToBeMerged) {
  while(pendingToBeMerged.size() = 0) {
pendingToBeMerged.wait();
  }
  // Pickup the inputs to merge.
  inputs = pendingToBeMerged.removeFirst();
}
{code}



 Reducer hung in copy phase.
 ---

 Key: MAPREDUCE-6351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Laxman
 Attachments: jstat-gc.log, reducer-container-partial.log.zip, 
 thread-dumps.out


 *Problem*
 Reducer gets stuck in copy phase and doesn't make progress for very long 
 time. After killing this task for couple of times manually, it gets 
 completed. 
 *Observations*
 - Verfied gc logs. Found no memory related issues. Attached the logs.
 - Verified thread dumps. Found no thread related problems. 
 - On verification of logs, fetcher threads are not copying the map outputs 
 and they are just waiting for merge to happen.
 - Merge thread is alive and in wait state.
 *Analysis* 
 On careful observation of logs, thread dumps and code, this looks to me like 
 a classic case of multi-threading issue. Thread goes to wait state after it 
 has been notified. 
 Here is the suspect code flow.
 *Thread #1*
 Fetcher thread - notification comes first
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
 {code}
   synchronized(pendingToBeMerged) {
 pendingToBeMerged.addLast(toMergeInputs);
 pendingToBeMerged.notifyAll();
   }
 {code}
 *Thread #2*
 Merge Thread - goes to wait state (Notification goes unconsumed)
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
 {code}
 synchronized (pendingToBeMerged) {
   while(pendingToBeMerged.size() = 0) {
 pendingToBeMerged.wait();
   }
   // Pickup the inputs to merge.
   inputs = pendingToBeMerged.removeFirst();
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6351) Reducer hung in copy phase.

2015-05-02 Thread Laxman (JIRA)
Laxman created MAPREDUCE-6351:
-

 Summary: Reducer hung in copy phase.
 Key: MAPREDUCE-6351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Laxman


*Problem*
Reducer gets stuck in copy phase and doesn't make progress for very long time. 
After killing this task for couple of times manually, it gets completed. 

*Analysis*
- Verfied gc logs. Found no memory related issues. Attache
- Verified thread dumps. Found no thread related problems. 
- On verification of logs, fetcher threads are not copying the map outputs and 
they are just waiting for merge to happen.
- Merge thread is alive and in wait state.

On careful observation of logs, thread dumps and code, this looks to me like a 
classic case of multi-threading issue. Thread goes to wait state after it has 
been notified. 

Here is the suspect code flow.

*Thread #1*
Fetcher thread - notification comes first
org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(SetT)
{code}
  synchronized(pendingToBeMerged) {
pendingToBeMerged.addLast(toMergeInputs);
pendingToBeMerged.notifyAll();
  }
{code}

*Thread #2*
Merge Thread - goes to wait state (Notification goes unconsumed)
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()
{code}
synchronized (pendingToBeMerged) {
  while(pendingToBeMerged.size() = 0) {
pendingToBeMerged.wait();
  }
  // Pickup the inputs to merge.
  inputs = pendingToBeMerged.removeFirst();
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-12-11 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528914#comment-13528914
 ] 

Laxman commented on MAPREDUCE-4049:
---

With this patch, we are able to meet our goals (plugin custom shuffle algorithm 
Network Levitate Merge) without any issues. Thanks Avner for keeping the 
patch small and crisp.


 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
Assignee: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: 3.0.0

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504558#comment-13504558
 ] 

Laxman commented on MAPREDUCE-4049:
---

bq. You are warmly welcomed to contribute to push the algorithms of this plugin 
to the core of vanilla Hadoop

Thank you Avner. I wish to see this as part of hadoop.
I'm not able to build UDA you have provided as per BUILD.README provided in the 
downloaded bundle. SVN repository provided is not accessible/resolvable.

https://sirius.voltaire.com/repos/enterprise/uda/trunk

bq. as well as to help accepting my straight forward patch in this JIRA issue.
I will personally request few of my friends (Hadoop contributors) to review 
this jira.

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-27 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504560#comment-13504560
 ] 

Laxman commented on MAPREDUCE-4049:
---

I'm trying to build as per the README available here 
(http://mellanox.com/downloads/UDA/UDA3.0_Release.tar.gz).

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)
 # I am providing link for downloading UDA - Mellanox's open source plugin 
 that implements generic shuffle service using RDMA and levitated merge.  
 Note: At this phase, the code is in C++ through JNI and you should consider 
 it as beta only.  Still, it can serve anyone that wants to implement or 
 contribute to levitated merge. (Please be advised that levitated merge is 
 mostly suit in very fast networks) - 
 [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable

2012-11-26 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503966#comment-13503966
 ] 

Laxman commented on MAPREDUCE-4807:
---

Sorry for the repeated noise. It's my mistake.

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable

2012-11-25 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503494#comment-13503494
 ] 

Laxman commented on MAPREDUCE-4807:
---

{code}
-  private class MapOutputBufferK extends Object, V extends Object
+  @InterfaceAudience.LimitedPrivate({MapReduce})
+  @InterfaceStability.Unstable
+  public static class MapOutputBufferK extends Object, V extends Object
{code}

I applied this patch. Some compilation issues due to above snippet (non-static 
to static class).

{color:red}Cannot make a static reference to the non-static method getTaskID() 
from the type Task{color}

Wondering how this patch got through QA bot.

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable

2012-11-25 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503529#comment-13503529
 ] 

Laxman commented on MAPREDUCE-4807:
---

Asokan, I applied the patches in correct order as you mentioned in 
MAPREDUCE-4808.
Even in combo patch attached I can see the above mentioned snippet(non-static 
to static) as problem.


 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable

2012-11-25 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503620#comment-13503620
 ] 

Laxman commented on MAPREDUCE-4807:
---

I manually merged your patch to branch-2. But that should not be a cause of 
this problem.
I had just gone through trunk code also. 

{code}
-  private class MapOutputBufferK extends Object, V extends Object
+  @InterfaceAudience.LimitedPrivate({MapReduce})
+  @InterfaceStability.Unstable
+  public static class MapOutputBufferK extends Object, V extends Object
{code}

Please consider the following leaving aside the problems w.r.to applying patch 
or wrong branch.

* MapOutputBuffer is a static inner class.
* getTaskID() is a inherited method from Task (non-static)
* In patch, we are referring to non-static method(getTaskID()) from static 
context (public static class MapOutputBuffer)

This is a trivial problem. Isn't it?

 Allow MapOutputBuffer to be pluggable
 -

 Key: MAPREDUCE-4807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4807.patch, mapreduce-4807.patch, 
 mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch


 Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2012-11-24 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503362#comment-13503362
 ] 

Laxman commented on MAPREDUCE-4049:
---

Hi Avner, we are also impressed by Network Levitated Merge algorithm and we 
are in the process of implementing this. With your consent, we would like to 
collaborate and contribute to this issue.

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: performance, task, tasktracker
Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Fix For: trunk

 Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
 mapreduce-4049.patch, mapreduce-4049.patch, mapreduce-4049.patch, 
 mapreduce-4049.patch


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-24 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503373#comment-13503373
 ] 

Laxman commented on MAPREDUCE-2454:
---

Asokan  Alejandro, iiuc the objective/goal of this feature is to make the 
Shuffler  Merger pluggable. So that, mapreduce user can plugin better 
algorithms.

We are in the process of implementing the Network Levitated Merge algorithm [ 
http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf ]. With this, we wanted to 
avoid the disk usage on reducer side completely and directly shuffle the map 
outputs to the reducer.

IMO, with MAPREDUCE-2454 and other sub-tasks as well, we may *not be able to 
plugin* the above algorithm.

Request you to provide your suggestions on this.

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454-modified-code.patch, mapreduce-2454-modified-test.patch, 
 mapreduce-2454-new-test.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454-protection-change.patch, 
 mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
 ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-11-24 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503416#comment-13503416
 ] 

Laxman commented on MAPREDUCE-2454:
---

Asokan, thank you very much for your quick response and detailed clarification.
I will give a try with the patches available here. I will get back to you soon 
on this.

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
 mapreduce-2454-modified-code.patch, mapreduce-2454-modified-test.patch, 
 mapreduce-2454-new-test.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454-protection-change.patch, 
 mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
 ReduceInputSorter.java


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAPREDUCE-2243) Close all the file streams propely in a finally block to avoid their leakage.

2011-01-16 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12982466#action_12982466
 ] 

Laxman commented on MAPREDUCE-2243:
---

@Owen

bq. In most cases, the exceptions outside of IOException don't matter much 
because they will bring down.
bq. this leaves the nominal case simple. Note that this is the worst case, if 
we get an Error every system in Hadoop should shutdown.
bq. There is no point in continuing and worrying about lost file handles at 
that point is too extreme. 

Yes, I agree to your point in *Error* scenarios. How about some runtime 
exception which need not be handled in the positive flow?

Handling unexpected generic exceptions and errors will result in catch and 
rethrow pattern. So, I prefer to handle the stream closure in try block as well 
as in finally block.

As per your initial comments Kamesh has corrected to close the streams in try 
block as well as in finally block.
Do you still see some issue with this approach? 
How handling stream close in catch block is better than handling in try and 
finally blocks? 

My opinion on this issue is Handling stream closures in try and finally block 
is fool proof and it will avoid some code duplication.

 Close all the file streams propely in a finally block to avoid their leakage.
 -

 Key: MAPREDUCE-2243
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2243
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker, tasktracker
Affects Versions: 0.20.1, 0.22.0
 Environment: NA
Reporter: Bhallamudi Venkata Siva Kamesh
Priority: Minor
   Original Estimate: 72h
  Remaining Estimate: 72h

 In the following classes streams should be closed in finally block to avoid 
 their leakage in the exceptional cases.
 CompletedJobStatusStore.java
 --
dataOut.writeInt(events.length);
 for (TaskCompletionEvent event : events) {
   event.write(dataOut);
 }
dataOut.close() ;
 EventWriter.java
 --
encoder.flush();
out.close();
 MapTask.java
 ---
 splitMetaInfo.write(out);
  out.close();
 TaskLog
 
  1) str = fis.readLine();
   fis.close();
 2) dos.writeBytes(Long.toString(new File(logLocation, LogName.SYSLOG
   .toString()).length() - prevLogLength) + \n);
 dos.close();
 TotalOrderPartitioner.java
 ---
  while (reader.next(key, value)) {
 parts.add(key);
 key = ReflectionUtils.newInstance(keyClass, conf);
   }
 reader.close();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.