[jira] [Commented] (MAPREDUCE-6850) Shuffle Handler keep-alive connections are closed from the server side

2017-02-27 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886703#comment-15886703
 ] 

Rajesh Balamohan commented on MAPREDUCE-6850:
-

Thanks for sharing the latest patch [~jeagles]. .4 patch lgtm.

> Shuffle Handler keep-alive connections are closed from the server side
> --
>
> Key: MAPREDUCE-6850
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6850
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: MAPREDUCE-6850.1.patch, MAPREDUCE-6850.2.patch, 
> MAPREDUCE-6850.3.patch, MAPREDUCE-6850.4.patch, With_Issue.png, 
> With_Patch.png, With_Patch_withData.png
>
>
> When performance testing tez shuffle handler (TEZ-3334), it was noticed the 
> keep-alive connections are closed from the server-side. The client silently 
> recovers and logs the connection as keep-alive, despite reestablishing a 
> connection. This jira aims to remove the close from the server side, fixing 
> the bug preventing keep-alive connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6850) Shuffle Handler keep-alive connections are closed from the server side

2017-02-26 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884985#comment-15884985
 ] 

Rajesh Balamohan edited comment on MAPREDUCE-6850 at 2/27/17 2:15 AM:
--

I checked in small multi-node cluster with the patch. Attaching the tcpdump 
screenshots for reference. Patch works fine with keep-alive enabled and 
connections are being reused, where mapOutputs are retrieved using same 
connection. Attachment "With_Patch.png" shows the TCP stream, where multiple 
mapOutput being fetched from same connection.

One very minor comment in the patch.  {{timer}} variable in 
{{HttpPipelineFactory}} may not be needed.

In MAPREDUCE-5787, Keepalive parameter checks were there till 
https://issues.apache.org/jira/secure/attachment/12634984/MAPREDUCE-5787-2.4.0-v3.patch
 as follows. 
{noformat}
if (!keepAlive && !keepAliveParam) {
  lastMap.addListener(ChannelFutureListener.CLOSE);
}
{noformat}

However, during refactoring it got missed out in subsequent patches in the same 
JIRA. That caused this problem. However, It would have relied on client to 
close the connection. I.e it was the responsibility of the client (JDK's 
internal http client) to terminate the connection after keep-alive timeout. 
Current patch proposed in this JIRA addresses that scenario as well, where in 
it would automatically close the connection if timeout exceeds the threshold 
provided in server side.





was (Author: rajesh.balamohan):
I checked in small multi-node cluster with the patch. Attaching the tcpdump 
screenshots for reference. Patch works fine with keep-alive enabled and 
connections are being reused, where mapOutputs are retrieved using same 
connection. Attachment "With_Patch.png" shows the TCP stream, where multiple 
mapOutput being fetched from same connection.

One very minor comment in the patch.  {{timer}} variable in 
{{HttpPipelineFactory}} may not be needed.

In MAPREDUCE-5787, Keepalive parameter checks were there till 
https://issues.apache.org/jira/secure/attachment/12634984/MAPREDUCE-5787-2.4.0-v3.patch
 as follows. 
{noformat}
if (!keepAlive && !keepAliveParam) {
  lastMap.addListener(ChannelFutureListener.CLOSE);
}
{noformat}

However, during refactoring it got missed out in subsequent patches. That 
caused this problem. However, It would have relied on client to close the 
connection. I.e it was the responsibility of the client (JDK's internal http 
client) to terminate the connection after keep-alive timeout. Current patch 
proposed in this JIRA addresses that scenario as well, where in it would 
automatically close the connection if timeout exceeds the threshold provided in 
server side.




> Shuffle Handler keep-alive connections are closed from the server side
> --
>
> Key: MAPREDUCE-6850
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6850
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: MAPREDUCE-6850.1.patch, MAPREDUCE-6850.2.patch, 
> MAPREDUCE-6850.3.patch, With_Issue.png, With_Patch.png, 
> With_Patch_withData.png
>
>
> When performance testing tez shuffle handler (TEZ-3334), it was noticed the 
> keep-alive connections are closed from the server-side. The client silently 
> recovers and logs the connection as keep-alive, despite reestablishing a 
> connection. This jira aims to remove the close from the server side, fixing 
> the bug preventing keep-alive connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6850) Shuffle Handler keep-alive connections are closed from the server side

2017-02-26 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-6850:

Attachment: With_Patch_withData.png
With_Patch.png
With_Issue.png

I checked in small multi-node cluster with the patch. Attaching the tcpdump 
screenshots for reference. Patch works fine with keep-alive enabled and 
connections are being reused, where mapOutputs are retrieved using same 
connection. Attachment "With_Patch.png" shows the TCP stream, where multiple 
mapOutput being fetched from same connection.

One very minor comment in the patch.  {{timer}} variable in 
{{HttpPipelineFactory}} may not be needed.

In MAPREDUCE-5787, Keepalive parameter checks were there till 
https://issues.apache.org/jira/secure/attachment/12634984/MAPREDUCE-5787-2.4.0-v3.patch
 as follows. 
{noformat}
if (!keepAlive && !keepAliveParam) {
  lastMap.addListener(ChannelFutureListener.CLOSE);
}
{noformat}

However, during refactoring it got missed out in subsequent patches. That 
caused this problem. However, It would have relied on client to close the 
connection. I.e it was the responsibility of the client (JDK's internal http 
client) to terminate the connection after keep-alive timeout. Current patch 
proposed in this JIRA addresses that scenario as well, where in it would 
automatically close the connection if timeout exceeds the threshold provided in 
server side.




> Shuffle Handler keep-alive connections are closed from the server side
> --
>
> Key: MAPREDUCE-6850
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6850
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: MAPREDUCE-6850.1.patch, MAPREDUCE-6850.2.patch, 
> MAPREDUCE-6850.3.patch, With_Issue.png, With_Patch.png, 
> With_Patch_withData.png
>
>
> When performance testing tez shuffle handler (TEZ-3334), it was noticed the 
> keep-alive connections are closed from the server-side. The client silently 
> recovers and logs the connection as keep-alive, despite reestablishing a 
> connection. This jira aims to remove the close from the server side, fixing 
> the bug preventing keep-alive connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6850) Shuffle Handler keep-alive connections are closed from the server side

2017-02-24 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884074#comment-15884074
 ] 

Rajesh Balamohan commented on MAPREDUCE-6850:
-

Patch looks good to me.  I would need more time for checking in cluster. Got 
DFSOutputStream timeout exception in the cluster I was trying out (which is not 
related to this jira)


> Shuffle Handler keep-alive connections are closed from the server side
> --
>
> Key: MAPREDUCE-6850
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6850
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: MAPREDUCE-6850.1.patch, MAPREDUCE-6850.2.patch, 
> MAPREDUCE-6850.3.patch
>
>
> When performance testing tez shuffle handler (TEZ-3334), it was noticed the 
> keep-alive connections are closed from the server-side. The client silently 
> recovers and logs the connection as keep-alive, despite reestablishing a 
> connection. This jira aims to remove the close from the server side, fixing 
> the bug preventing keep-alive connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6850) Shuffle Handler keep-alive connections are closed from the server side

2017-02-24 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882248#comment-15882248
 ] 

Rajesh Balamohan commented on MAPREDUCE-6850:
-

Thanks for the patch [~jeagles]. I am getting a small cluster today/tomorrow. I 
will check the patch and will update.

> Shuffle Handler keep-alive connections are closed from the server side
> --
>
> Key: MAPREDUCE-6850
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6850
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: MAPREDUCE-6850.1.patch, MAPREDUCE-6850.2.patch, 
> MAPREDUCE-6850.3.patch
>
>
> When performance testing tez shuffle handler (TEZ-3334), it was noticed the 
> keep-alive connections are closed from the server-side. The client silently 
> recovers and logs the connection as keep-alive, despite reestablishing a 
> connection. This jira aims to remove the close from the server side, fixing 
> the bug preventing keep-alive connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-20 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Status: Open  (was: Patch Available)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
  Labels: ShuffleKeepalive
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0-v4.patch, 
 MAPREDUCE-5787-2.4.0-v5-v6-diff.patch, MAPREDUCE-5787-2.4.0-v5.patch, 
 MAPREDUCE-5787-2.4.0-v6.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-20 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Attachment: MAPREDUCE-5787-2.4.0-v7.patch

Addressed Vinod's concern on increased in memory due to mapOutputFileName and 
IndexRecord.  It is possible to configure the cache via 
mapreduce.shuffle.mapoutput-info.meta.cache.size. (default value is 1000).  
String  locaDirAllocator computations will be carried out twice if number of 
mapId goes past this limit. 

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
  Labels: ShuffleKeepalive
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0-v4.patch, 
 MAPREDUCE-5787-2.4.0-v5-v6-diff.patch, MAPREDUCE-5787-2.4.0-v5.patch, 
 MAPREDUCE-5787-2.4.0-v6.patch, MAPREDUCE-5787-2.4.0-v7.patch, 
 MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-20 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Status: Patch Available  (was: Open)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
  Labels: ShuffleKeepalive
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0-v4.patch, 
 MAPREDUCE-5787-2.4.0-v5-v6-diff.patch, MAPREDUCE-5787-2.4.0-v5.patch, 
 MAPREDUCE-5787-2.4.0-v6.patch, MAPREDUCE-5787-2.4.0-v7.patch, 
 MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-19 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Status: Patch Available  (was: Open)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
  Labels: ShuffleKeepalive
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0-v4.patch, 
 MAPREDUCE-5787-2.4.0-v5.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-19 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Attachment: MAPREDUCE-5787-2.4.0-v5.patch

Incorporated review comments from Vinod

 Can we also change the MapReduce fetcher to use keep-alive depending on 
 whether it is enabled or not?
- HttpURLConnection will automatically use persistent connection when 
keep-alive and Content-Length headers are properly set.  So, there is no need 
to change the fetcher code.

 Suggestion for Configuration renames
- Fixed

Add both to the mapred-default.xml
- Fixed

LOG KeepAliveParam along with other things like jobId, mapId etc.
-Fixed

 populateHeaders. We are already parsing jobID, ApplicationId etc as part of 
 sendMapOutput. We should avoid doing the string parsing multiple times. 
 Is setting CONTENT_LENGTH important? Even so, for doing it, we are reading 
 the index-record two times 
- Yes, content-length is very much needed for this. Fixed multiple parsing 
issue.

Instead of re-defining new constants like CONNECTION_HEADER in 
ShuffleHandler, can you use the standard constants in java (HttpHeaders)?
- Fixed

 Finally, can you reuse code between the two tests?
- Fixed

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
  Labels: ShuffleKeepalive
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0-v4.patch, 
 MAPREDUCE-5787-2.4.0-v5.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-18 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939955#comment-13939955
 ] 

Rajesh Balamohan commented on MAPREDUCE-5787:
-


Is setting CONTENT_LENGTH important? Even so, for doing it, we are reading the 
index-record two times - once here and once while sending the output. This will 
have performance impact.

Yes, CONTENT_LENGTH is needed in the client side for keep alive.  A placeholder 
is needed to avoid computing the index-record 2 times.  I will refactor and 
post the patch asap.

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
  Labels: ShuffleKeepalive
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0-v4.patch, 
 MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Status: Open  (was: Patch Available)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Status: Patch Available  (was: Open)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Attachment: (was: BUG-14568-v3-branch-2.4.0.patch)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Attachment: MAPREDUCE-5787-2.4.0-v3.patch

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Attachment: BUG-14568-v3-branch-2.4.0.patch

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Status: Open  (was: Patch Available)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Attachment: (was: MAPREDUCE-5787-2.4.0-v3.patch)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Status: Patch Available  (was: Open)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Attachment: MAPREDUCE-5787-2.4.0-v3.patch

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937421#comment-13937421
 ] 

Rajesh Balamohan commented on MAPREDUCE-5787:
-

Review Request:  
https://reviews.apache.org/r/19264/diff/1/?file=521462#file521462line585

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Status: Open  (was: Patch Available)

Will incorporate the review comments from Gopal 
(https://reviews.apache.org/r/19264/diff/1/?file=521462#file521462line585) and 
upload the patch

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Attachment: MAPREDUCE-5787-2.4.0-v3.patch

Incorporated review comments from Gopal 
(https://reviews.apache.org/r/19264/diff/1/?file=521462#file521462line585)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0-v3.patch, 
 MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Status: Patch Available  (was: Open)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0-v3.patch, 
 MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Attachment: (was: MAPREDUCE-5787-2.4.0-v3.patch)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
  Labels: ShuffleKeepalive
 Fix For: 2.4.0

 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Attachment: MAPREDUCE-5787-2.4.0-v4.patch

Renaming the patch as v4.

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
  Labels: ShuffleKeepalive
 Fix For: 2.4.0

 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0-v4.patch, 
 MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Status: Open  (was: Patch Available)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
  Labels: ShuffleKeepalive
 Fix For: 2.4.0

 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0-v4.patch, 
 MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Status: Patch Available  (was: Open)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Critical
  Labels: ShuffleKeepalive
 Fix For: 2.4.0

 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, 
 MAPREDUCE-5787-2.4.0-v3.patch, MAPREDUCE-5787-2.4.0-v4.patch, 
 MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-13 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Attachment: MAPREDUCE-5787-2.4.0-v2.patch

- Keep-Alive is disabled by default.
- Keep-Alive can be enabled by setting mapreduce.shuffle.enable.keep.alive
- Timeout parameter can be adjusted using 
mapreduce.shuffle.enable.keep.alive.timeout
- There is add-on facility wherein Keep-Alive can be enabled via request URL by 
adding keepAlive=true parameter.  This will allow frameworks like Tez to 
benefit from Keep-Alive connections   without affecting any MR jobs (for which 
the keep-alive connections are disabled by default). 

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-13 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Status: Patch Available  (was: Open)

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0-v2.patch, MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-12 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-5787:


Attachment: MAPREDUCE-5787-2.4.0.patch

 Modify ShuffleHandler to support Keep-Alive
 ---

 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: MAPREDUCE-5787-2.4.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5788) Modify Fetcher to pull data using persistent connection

2014-03-12 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931659#comment-13931659
 ] 

Rajesh Balamohan commented on MAPREDUCE-5788:
-

Existing HttpUrlConnection is capable of handling persistent connections as 
long as Content-Length header is specified. It also honors Keep-Alive: 
timeout header.
ShuffleHandler would be sending Content-Length  Keep-Alive:timeout 
headers, if mapreduce.shuffle.enable.keep.alive is set to true.  No changes 
are needed in Fetcher side (Need to mark this as wont-fix)

 Modify Fetcher to pull data using persistent connection
 ---

 Key: MAPREDUCE-5788
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5788
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5786) Support Keep-Alive connections in ShuffleHandler

2014-03-10 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926423#comment-13926423
 ] 

Rajesh Balamohan commented on MAPREDUCE-5786:
-

Thanks for comments Jason.  We need to have 
mapreduce.shuffle.enable.keep.alive to enable keep-alive in the 
ShuffleHandler and mapreduce.shuffle.enable.keep.alive.timeout to determine 
the time-out value for the persistent connection.  E.g, Keep-Alive: 
timeout=60 header specifies that the connection will be kept alive for 60 
seconds after which the connection will be closed.  This will allow us to tune 
persistent connection duration on large clusters with different job patterns.

 Support Keep-Alive connections in ShuffleHandler
 

 Key: MAPREDUCE-5786
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5786
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: shuffle

 Currently ShuffleHandler supports fetching map-outputs in batches from same 
 host.  But there are scenarios wherein, fetchers pull data aggressively (i.e 
 start pulling the data as  when they are available).  In this case, the 
 number of mapIds that are pulled from same host remains at 1. This causes 
 lots of connections to be established.
 Number of connections can be reduced a lot if ShuffleHandler supports 
 Keep-Alive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5786) Support Keep-Alive connections in ShuffleHandler

2014-03-08 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created MAPREDUCE-5786:
---

 Summary: Support Keep-Alive connections in ShuffleHandler
 Key: MAPREDUCE-5786
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5786
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan


Currently ShuffleHandler supports fetching map-outputs in batches from same 
host.  But there are scenarios wherein, fetchers pull data aggressively (i.e 
start pulling the data as  when they are available).  In this case, the number 
of mapIds that are pulled from same host remains at 1. This causes lots of 
connections to be established.

Number of connections can be reduced a lot if ShuffleHandler supports 
Keep-Alive.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5787) Modify ShuffleHandler to support Keep-Alive

2014-03-08 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created MAPREDUCE-5787:
---

 Summary: Modify ShuffleHandler to support Keep-Alive
 Key: MAPREDUCE-5787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5787
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5788) Modify Fetcher to pull data using persistent connection

2014-03-08 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created MAPREDUCE-5788:
---

 Summary: Modify Fetcher to pull data using persistent connection
 Key: MAPREDUCE-5788
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5788
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5611) CombineFileInputFormat only requests a single location per split when more could be optimal

2013-12-13 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848164#comment-13848164
 ] 

Rajesh Balamohan commented on MAPREDUCE-5611:
-

Agreed, ideal will be to compute and have the *intersection* of the nodes in 
the split information.  We will modify the patch to accommodate this and post 
the details.

 CombineFileInputFormat only requests a single location per split when more 
 could be optimal
 ---

 Key: MAPREDUCE-5611
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Chandra Prakash Bhagtani
Assignee: Chandra Prakash Bhagtani
 Attachments: CombineFileInputFormat-trunk.patch


 I have come across an issue with CombineFileInputFormat. Actually I ran a 
 hive query on approx 1.2 GB data with CombineHiveInputFormat which internally 
 uses CombineFileInputFormat. My cluster size is 9 datanodes and 
 max.split.size is 256 MB
 When I ran this query with replication factor 9, hive consistently creates 
 all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local 
 and 1 data local tasks. 
  When replication factor is 9 (equal to cluster size), all the tasks should 
 be data-local as each datanode contains all the replicas of the input data, 
 but that is not happening i.e all the tasks are rack-local. 
 When I dug into CombineFileInputFormat.java code in getMoreSplits method, I 
 found the issue with the following snippet (specially in case of higher 
 replication factor)
 {code:title=CombineFileInputFormat.java|borderStyle=solid}
 for (IteratorMap.EntryString,
  ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator();
  iter.hasNext();) {
Map.EntryString, ListOneBlockInfo one = iter.next();
   nodes.add(one.getKey());
   ListOneBlockInfo blocksInNode = one.getValue();
   // for each block, copy it into validBlocks. Delete it from
   // blockToNodes so that the same block does not appear in
   // two different splits.
   for (OneBlockInfo oneblock : blocksInNode) {
 if (blockToNodes.containsKey(oneblock)) {
   validBlocks.add(oneblock);
   blockToNodes.remove(oneblock);
   curSplitSize += oneblock.length;
   // if the accumulated split size exceeds the maximum, then
   // create this split.
   if (maxSize != 0  curSplitSize = maxSize) {
 // create an input split and add it to the splits array
 addCreatedSplit(splits, nodes, validBlocks);
 curSplitSize = 0;
 validBlocks.clear();
   }
 }
   }
 {code}
 First node in the map nodeToBlocks has all the replicas of input file, so the 
 above code creates 6 splits all with only one location. Now if JT doesn't 
 schedule these tasks on that node, all the tasks will be rack-local, even 
 though all the other datanodes have all the other replicas.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5611) CombineFileInputFormat creates more rack-local tasks due to less split location info.

2013-11-28 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835199#comment-13835199
 ] 

Rajesh Balamohan commented on MAPREDUCE-5611:
-


Thanks Chandra.  This is a good perf patch.  Here are the data locality numbers 
which can be useful to analyze the perf improvement.


Without Patch:
Job CountersLaunched map tasks  0   0   335
Data-local map tasks0   0   179
Rack-local map tasks0   0   81

With Patch:
Job CountersLaunched map tasks  0   0   335
Data-local map tasks0   0   279
Rack-local map tasks0   0   47

The data locality improves a lot with this patch in Hive queries.  


 CombineFileInputFormat creates more rack-local tasks due to less split 
 location info.
 -

 Key: MAPREDUCE-5611
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: trunk
Reporter: Chandra Prakash Bhagtani
Assignee: Chandra Prakash Bhagtani
 Fix For: trunk

 Attachments: CombineFileInputFormat-trunk.patch


 I have come across an issue with CombineFileInputFormat. Actually I ran a 
 hive query on approx 1.2 GB data with CombineHiveInputFormat which internally 
 uses CombineFileInputFormat. My cluster size is 9 datanodes and 
 max.split.size is 256 MB
 When I ran this query with replication factor 9, hive consistently creates 
 all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local 
 and 1 data local tasks. 
  When replication factor is 9 (equal to cluster size), all the tasks should 
 be data-local as each datanode contains all the replicas of the input data, 
 but that is not happening i.e all the tasks are rack-local. 
 When I dug into CombineFileInputFormat.java code in getMoreSplits method, I 
 found the issue with the following snippet (specially in case of higher 
 replication factor)
 {code:title=CombineFileInputFormat.java|borderStyle=solid}
 for (IteratorMap.EntryString,
  ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator();
  iter.hasNext();) {
Map.EntryString, ListOneBlockInfo one = iter.next();
   nodes.add(one.getKey());
   ListOneBlockInfo blocksInNode = one.getValue();
   // for each block, copy it into validBlocks. Delete it from
   // blockToNodes so that the same block does not appear in
   // two different splits.
   for (OneBlockInfo oneblock : blocksInNode) {
 if (blockToNodes.containsKey(oneblock)) {
   validBlocks.add(oneblock);
   blockToNodes.remove(oneblock);
   curSplitSize += oneblock.length;
   // if the accumulated split size exceeds the maximum, then
   // create this split.
   if (maxSize != 0  curSplitSize = maxSize) {
 // create an input split and add it to the splits array
 addCreatedSplit(splits, nodes, validBlocks);
 curSplitSize = 0;
 validBlocks.clear();
   }
 }
   }
 {code}
 First node in the map nodeToBlocks has all the replicas of input file, so the 
 above code creates 6 splits all with only one location. Now if JT doesn't 
 schedule these tasks on that node, all the tasks will be rack-local, even 
 though all the other datanodes have all the other replicas.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5611) CombineFileInputFormat creates more rack-local tasks due to less split location info.

2013-11-28 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835200#comment-13835200
 ] 

Rajesh Balamohan commented on MAPREDUCE-5611:
-

Just wanted to add the response times as well

Without Patch : 289 seconds
With Patch: 219 seconds

This testing was carried out with with Hive 0.10

 CombineFileInputFormat creates more rack-local tasks due to less split 
 location info.
 -

 Key: MAPREDUCE-5611
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: trunk
Reporter: Chandra Prakash Bhagtani
Assignee: Chandra Prakash Bhagtani
 Fix For: trunk

 Attachments: CombineFileInputFormat-trunk.patch


 I have come across an issue with CombineFileInputFormat. Actually I ran a 
 hive query on approx 1.2 GB data with CombineHiveInputFormat which internally 
 uses CombineFileInputFormat. My cluster size is 9 datanodes and 
 max.split.size is 256 MB
 When I ran this query with replication factor 9, hive consistently creates 
 all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local 
 and 1 data local tasks. 
  When replication factor is 9 (equal to cluster size), all the tasks should 
 be data-local as each datanode contains all the replicas of the input data, 
 but that is not happening i.e all the tasks are rack-local. 
 When I dug into CombineFileInputFormat.java code in getMoreSplits method, I 
 found the issue with the following snippet (specially in case of higher 
 replication factor)
 {code:title=CombineFileInputFormat.java|borderStyle=solid}
 for (IteratorMap.EntryString,
  ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator();
  iter.hasNext();) {
Map.EntryString, ListOneBlockInfo one = iter.next();
   nodes.add(one.getKey());
   ListOneBlockInfo blocksInNode = one.getValue();
   // for each block, copy it into validBlocks. Delete it from
   // blockToNodes so that the same block does not appear in
   // two different splits.
   for (OneBlockInfo oneblock : blocksInNode) {
 if (blockToNodes.containsKey(oneblock)) {
   validBlocks.add(oneblock);
   blockToNodes.remove(oneblock);
   curSplitSize += oneblock.length;
   // if the accumulated split size exceeds the maximum, then
   // create this split.
   if (maxSize != 0  curSplitSize = maxSize) {
 // create an input split and add it to the splits array
 addCreatedSplit(splits, nodes, validBlocks);
 curSplitSize = 0;
 validBlocks.clear();
   }
 }
   }
 {code}
 First node in the map nodeToBlocks has all the replicas of input file, so the 
 above code creates 6 splits all with only one location. Now if JT doesn't 
 schedule these tasks on that node, all the tasks will be rack-local, even 
 though all the other datanodes have all the other replicas.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-2450) Calls from running tasks to TaskTracker methods sometimes fail and incur a 60s timeout

2011-05-19 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036047#comment-13036047
 ] 

Rajesh Balamohan commented on MAPREDUCE-2450:
-

 Todd Lipcon added a comment - 19/May/11 06:41
Would this also be fixed by HADOOP-6762?

Hi Todd,

Hadoop-6762 could be fixing this issue as well. 

However, I haven't tested with Hadoop-6762. 

The patch proposed in 
https://issues.apache.org/jira/secure/attachment/12477611/mapreduce-2450.patch 
is well tested in large scale cluster repeatedly.






 Calls from running tasks to TaskTracker methods sometimes fail and incur a 
 60s timeout
 --

 Key: MAPREDUCE-2450
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2450
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Matei Zaharia
 Fix For: 0.23.0

 Attachments: HADOOP-5380.Y.20.branch.patch, HADOOP-5380.patch, 
 HADOOP_5380-Y.0.20.20x.patch, mapreduce-2450.patch


 I'm seeing some map tasks in my jobs take 1 minute to commit after they 
 finish the map computation. On the map side, the output looks like this:
 code
 2009-03-02 21:30:54,384 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Cannot 
 initialize JVM Metrics with processName=MAP, sessionId= - already initialized
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 800
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 
 300
 2009-03-02 21:30:55,493 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
 239075328/298844160
 2009-03-02 21:30:55,494 INFO org.apache.hadoop.mapred.MapTask: record buffer 
 = 786432/983040
 2009-03-02 21:31:00,381 INFO org.apache.hadoop.mapred.MapTask: Starting flush 
 of map output
 2009-03-02 21:31:07,892 INFO org.apache.hadoop.mapred.MapTask: Finished spill  0
 2009-03-02 21:31:07,951 INFO org.apache.hadoop.mapred.TaskRunner: 
 Task:attempt_200903022127_0001_m_003163_0 is done. And is in the process of 
 commiting
 2009-03-02 21:32:07,949 INFO org.apache.hadoop.mapred.TaskRunner: 
 Communication exception: java.io.IOException: Call to /127.0.0.1:50311 failed 
 on local exception: java.nio.channels.ClosedChannelException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:765)
   at org.apache.hadoop.ipc.Client.call(Client.java:733)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at org.apache.hadoop.mapred.$Proxy0.ping(Unknown Source)
   at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:525)
   at java.lang.Thread.run(Thread.java:619)
 Caused by: java.nio.channels.ClosedChannelException
   at 
 java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:167)
   at 
 java.nio.channels.SelectableChannel.register(SelectableChannel.java:254)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:331)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
 2009-03-02 21:32:07,953 INFO org.apache.hadoop.mapred.TaskRunner: Task 
 'attempt_200903022127_0001_m_003163_0' done.
 /code
 In the TaskTracker log, it looks like this:
 code
 2009-03-02 21:31:08,110 WARN org.apache.hadoop.ipc.Server: IPC Server 
 Responder, call ping(attempt_200903022127_0001_m_003163_0) from 
 127.0.0.1:56884: output error
 2009-03-02 21:31:08,111 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 10 on 50311 caught: java.nio.channels.ClosedChannelException
 at 
 sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)at 
 org.apache.hadoop.ipc.Server.channelWrite(Server.java:1195)
 at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
 at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:613)
 at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:677)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:981)
 /code
 Note that the task actually seemed to commit - it 

[jira] [Commented] (MAPREDUCE-2450) Calls from running tasks to TaskTracker methods sometimes fail and incur a 60s timeout

2011-04-28 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026246#comment-13026246
 ] 

Rajesh Balamohan commented on MAPREDUCE-2450:
-

Ran large sort job and GridMix-V3 with 1200 jobs to verify this patch. Large 
sort-job/gridmix-v3 often simulated the problem reported in the bug and with 
the patch sort job/gridmix-v3 executed fine without timeout issues in tasklogs.

This patch doesn't call for any additional testcases.

 Calls from running tasks to TaskTracker methods sometimes fail and incur a 
 60s timeout
 --

 Key: MAPREDUCE-2450
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2450
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Matei Zaharia
 Fix For: 0.23.0

 Attachments: HADOOP-5380.Y.20.branch.patch, HADOOP-5380.patch, 
 HADOOP_5380-Y.0.20.20x.patch, mapreduce-2450.patch


 I'm seeing some map tasks in my jobs take 1 minute to commit after they 
 finish the map computation. On the map side, the output looks like this:
 code
 2009-03-02 21:30:54,384 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Cannot 
 initialize JVM Metrics with processName=MAP, sessionId= - already initialized
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 800
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 
 300
 2009-03-02 21:30:55,493 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
 239075328/298844160
 2009-03-02 21:30:55,494 INFO org.apache.hadoop.mapred.MapTask: record buffer 
 = 786432/983040
 2009-03-02 21:31:00,381 INFO org.apache.hadoop.mapred.MapTask: Starting flush 
 of map output
 2009-03-02 21:31:07,892 INFO org.apache.hadoop.mapred.MapTask: Finished spill  0
 2009-03-02 21:31:07,951 INFO org.apache.hadoop.mapred.TaskRunner: 
 Task:attempt_200903022127_0001_m_003163_0 is done. And is in the process of 
 commiting
 2009-03-02 21:32:07,949 INFO org.apache.hadoop.mapred.TaskRunner: 
 Communication exception: java.io.IOException: Call to /127.0.0.1:50311 failed 
 on local exception: java.nio.channels.ClosedChannelException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:765)
   at org.apache.hadoop.ipc.Client.call(Client.java:733)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at org.apache.hadoop.mapred.$Proxy0.ping(Unknown Source)
   at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:525)
   at java.lang.Thread.run(Thread.java:619)
 Caused by: java.nio.channels.ClosedChannelException
   at 
 java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:167)
   at 
 java.nio.channels.SelectableChannel.register(SelectableChannel.java:254)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:331)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
 2009-03-02 21:32:07,953 INFO org.apache.hadoop.mapred.TaskRunner: Task 
 'attempt_200903022127_0001_m_003163_0' done.
 /code
 In the TaskTracker log, it looks like this:
 code
 2009-03-02 21:31:08,110 WARN org.apache.hadoop.ipc.Server: IPC Server 
 Responder, call ping(attempt_200903022127_0001_m_003163_0) from 
 127.0.0.1:56884: output error
 2009-03-02 21:31:08,111 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 10 on 50311 caught: java.nio.channels.ClosedChannelException
 at 
 sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)at 
 org.apache.hadoop.ipc.Server.channelWrite(Server.java:1195)
 at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
 at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:613)
 at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:677)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:981)
 /code
 Note that the task actually seemed to commit - it didn't get speculatively 
 executed or anything. However, the job 

[jira] [Updated] (MAPREDUCE-1461) Feature to instruct rumen-folder utility to skip jobs worth of specific duration

2011-04-27 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1461:


Status: Open  (was: Patch Available)

 Feature to instruct rumen-folder utility to skip jobs worth of specific 
 duration
 

 Key: MAPREDUCE-1461
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1461
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Reporter: Rajesh Balamohan
 Attachments: MR-1461-trunk.patch, mapreduce-1461--2010-02-05.patch, 
 mapreduce-1461--2010-03-04.patch


 JSON outputs of rumen on production logs can be huge in the order of multiple 
 GB. Rumen's folder utility helps in getting a smaller snapshot of this JSON 
 data.
 It would be helpful to have an option in rumen-folder, wherein user can 
 specify a duration from which rumen-folder should start processing data.
 Related JIRA link: https://issues.apache.org/jira/browse/MAPREDUCE-1295

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-1461) Feature to instruct rumen-folder utility to skip jobs worth of specific duration

2011-04-27 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1461:


Attachment: mr-1461-trunk-with-testcases.patch

Attaching the patch with -ve testcase as well.

 Feature to instruct rumen-folder utility to skip jobs worth of specific 
 duration
 

 Key: MAPREDUCE-1461
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1461
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.23.0
Reporter: Rajesh Balamohan
 Attachments: MR-1461-trunk.patch, mapreduce-1461--2010-02-05.patch, 
 mapreduce-1461--2010-03-04.patch, mr-1461-trunk-with-testcases.patch


 JSON outputs of rumen on production logs can be huge in the order of multiple 
 GB. Rumen's folder utility helps in getting a smaller snapshot of this JSON 
 data.
 It would be helpful to have an option in rumen-folder, wherein user can 
 specify a duration from which rumen-folder should start processing data.
 Related JIRA link: https://issues.apache.org/jira/browse/MAPREDUCE-1295

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-1461) Feature to instruct rumen-folder utility to skip jobs worth of specific duration

2011-04-27 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1461:


Affects Version/s: 0.23.0
   Status: Patch Available  (was: Open)

 Feature to instruct rumen-folder utility to skip jobs worth of specific 
 duration
 

 Key: MAPREDUCE-1461
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1461
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.23.0
Reporter: Rajesh Balamohan
 Attachments: MR-1461-trunk.patch, mapreduce-1461--2010-02-05.patch, 
 mapreduce-1461--2010-03-04.patch, mr-1461-trunk-with-testcases.patch


 JSON outputs of rumen on production logs can be huge in the order of multiple 
 GB. Rumen's folder utility helps in getting a smaller snapshot of this JSON 
 data.
 It would be helpful to have an option in rumen-folder, wherein user can 
 specify a duration from which rumen-folder should start processing data.
 Related JIRA link: https://issues.apache.org/jira/browse/MAPREDUCE-1295

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2450) Calls from running tasks to TaskTracker methods sometimes fail and incur a 60s timeout

2011-04-27 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2450:


Status: Open  (was: Patch Available)

 Calls from running tasks to TaskTracker methods sometimes fail and incur a 
 60s timeout
 --

 Key: MAPREDUCE-2450
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2450
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Matei Zaharia
 Fix For: 0.23.0

 Attachments: HADOOP-5380.Y.20.branch.patch, HADOOP-5380.patch, 
 HADOOP_5380-Y.0.20.20x.patch


 I'm seeing some map tasks in my jobs take 1 minute to commit after they 
 finish the map computation. On the map side, the output looks like this:
 code
 2009-03-02 21:30:54,384 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Cannot 
 initialize JVM Metrics with processName=MAP, sessionId= - already initialized
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 800
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 
 300
 2009-03-02 21:30:55,493 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
 239075328/298844160
 2009-03-02 21:30:55,494 INFO org.apache.hadoop.mapred.MapTask: record buffer 
 = 786432/983040
 2009-03-02 21:31:00,381 INFO org.apache.hadoop.mapred.MapTask: Starting flush 
 of map output
 2009-03-02 21:31:07,892 INFO org.apache.hadoop.mapred.MapTask: Finished spill  0
 2009-03-02 21:31:07,951 INFO org.apache.hadoop.mapred.TaskRunner: 
 Task:attempt_200903022127_0001_m_003163_0 is done. And is in the process of 
 commiting
 2009-03-02 21:32:07,949 INFO org.apache.hadoop.mapred.TaskRunner: 
 Communication exception: java.io.IOException: Call to /127.0.0.1:50311 failed 
 on local exception: java.nio.channels.ClosedChannelException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:765)
   at org.apache.hadoop.ipc.Client.call(Client.java:733)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at org.apache.hadoop.mapred.$Proxy0.ping(Unknown Source)
   at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:525)
   at java.lang.Thread.run(Thread.java:619)
 Caused by: java.nio.channels.ClosedChannelException
   at 
 java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:167)
   at 
 java.nio.channels.SelectableChannel.register(SelectableChannel.java:254)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:331)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
 2009-03-02 21:32:07,953 INFO org.apache.hadoop.mapred.TaskRunner: Task 
 'attempt_200903022127_0001_m_003163_0' done.
 /code
 In the TaskTracker log, it looks like this:
 code
 2009-03-02 21:31:08,110 WARN org.apache.hadoop.ipc.Server: IPC Server 
 Responder, call ping(attempt_200903022127_0001_m_003163_0) from 
 127.0.0.1:56884: output error
 2009-03-02 21:31:08,111 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 10 on 50311 caught: java.nio.channels.ClosedChannelException
 at 
 sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)at 
 org.apache.hadoop.ipc.Server.channelWrite(Server.java:1195)
 at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
 at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:613)
 at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:677)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:981)
 /code
 Note that the task actually seemed to commit - it didn't get speculatively 
 executed or anything. However, the job wasn't able to continue until this one 
 task was done. Both parties seem to think the channel was closed. How does 
 the channel get closed externally? If closing it from outside is unavoidable, 
 maybe the right thing to do is to set a much lower timeout, because 1 minute 
 delay can be pretty significant for a small job.


[jira] [Updated] (MAPREDUCE-2450) Calls from running tasks to TaskTracker methods sometimes fail and incur a 60s timeout

2011-04-27 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2450:


Attachment: mapreduce-2450.patch

resubmitting for hudson build.

 Calls from running tasks to TaskTracker methods sometimes fail and incur a 
 60s timeout
 --

 Key: MAPREDUCE-2450
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2450
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Matei Zaharia
 Fix For: 0.23.0

 Attachments: HADOOP-5380.Y.20.branch.patch, HADOOP-5380.patch, 
 HADOOP_5380-Y.0.20.20x.patch, mapreduce-2450.patch


 I'm seeing some map tasks in my jobs take 1 minute to commit after they 
 finish the map computation. On the map side, the output looks like this:
 code
 2009-03-02 21:30:54,384 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Cannot 
 initialize JVM Metrics with processName=MAP, sessionId= - already initialized
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 800
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 
 300
 2009-03-02 21:30:55,493 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
 239075328/298844160
 2009-03-02 21:30:55,494 INFO org.apache.hadoop.mapred.MapTask: record buffer 
 = 786432/983040
 2009-03-02 21:31:00,381 INFO org.apache.hadoop.mapred.MapTask: Starting flush 
 of map output
 2009-03-02 21:31:07,892 INFO org.apache.hadoop.mapred.MapTask: Finished spill  0
 2009-03-02 21:31:07,951 INFO org.apache.hadoop.mapred.TaskRunner: 
 Task:attempt_200903022127_0001_m_003163_0 is done. And is in the process of 
 commiting
 2009-03-02 21:32:07,949 INFO org.apache.hadoop.mapred.TaskRunner: 
 Communication exception: java.io.IOException: Call to /127.0.0.1:50311 failed 
 on local exception: java.nio.channels.ClosedChannelException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:765)
   at org.apache.hadoop.ipc.Client.call(Client.java:733)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at org.apache.hadoop.mapred.$Proxy0.ping(Unknown Source)
   at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:525)
   at java.lang.Thread.run(Thread.java:619)
 Caused by: java.nio.channels.ClosedChannelException
   at 
 java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:167)
   at 
 java.nio.channels.SelectableChannel.register(SelectableChannel.java:254)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:331)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
 2009-03-02 21:32:07,953 INFO org.apache.hadoop.mapred.TaskRunner: Task 
 'attempt_200903022127_0001_m_003163_0' done.
 /code
 In the TaskTracker log, it looks like this:
 code
 2009-03-02 21:31:08,110 WARN org.apache.hadoop.ipc.Server: IPC Server 
 Responder, call ping(attempt_200903022127_0001_m_003163_0) from 
 127.0.0.1:56884: output error
 2009-03-02 21:31:08,111 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 10 on 50311 caught: java.nio.channels.ClosedChannelException
 at 
 sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)at 
 org.apache.hadoop.ipc.Server.channelWrite(Server.java:1195)
 at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
 at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:613)
 at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:677)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:981)
 /code
 Note that the task actually seemed to commit - it didn't get speculatively 
 executed or anything. However, the job wasn't able to continue until this one 
 task was done. Both parties seem to think the channel was closed. How does 
 the channel get closed externally? If closing it from outside is unavoidable, 
 maybe the right thing to do is to set a much lower timeout, because 1 minute 
 

[jira] [Updated] (MAPREDUCE-2450) Calls from running tasks to TaskTracker methods sometimes fail and incur a 60s timeout

2011-04-27 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2450:


Status: Patch Available  (was: Open)

 Calls from running tasks to TaskTracker methods sometimes fail and incur a 
 60s timeout
 --

 Key: MAPREDUCE-2450
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2450
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Matei Zaharia
 Fix For: 0.23.0

 Attachments: HADOOP-5380.Y.20.branch.patch, HADOOP-5380.patch, 
 HADOOP_5380-Y.0.20.20x.patch, mapreduce-2450.patch


 I'm seeing some map tasks in my jobs take 1 minute to commit after they 
 finish the map computation. On the map side, the output looks like this:
 code
 2009-03-02 21:30:54,384 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Cannot 
 initialize JVM Metrics with processName=MAP, sessionId= - already initialized
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 800
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 
 300
 2009-03-02 21:30:55,493 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
 239075328/298844160
 2009-03-02 21:30:55,494 INFO org.apache.hadoop.mapred.MapTask: record buffer 
 = 786432/983040
 2009-03-02 21:31:00,381 INFO org.apache.hadoop.mapred.MapTask: Starting flush 
 of map output
 2009-03-02 21:31:07,892 INFO org.apache.hadoop.mapred.MapTask: Finished spill  0
 2009-03-02 21:31:07,951 INFO org.apache.hadoop.mapred.TaskRunner: 
 Task:attempt_200903022127_0001_m_003163_0 is done. And is in the process of 
 commiting
 2009-03-02 21:32:07,949 INFO org.apache.hadoop.mapred.TaskRunner: 
 Communication exception: java.io.IOException: Call to /127.0.0.1:50311 failed 
 on local exception: java.nio.channels.ClosedChannelException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:765)
   at org.apache.hadoop.ipc.Client.call(Client.java:733)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at org.apache.hadoop.mapred.$Proxy0.ping(Unknown Source)
   at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:525)
   at java.lang.Thread.run(Thread.java:619)
 Caused by: java.nio.channels.ClosedChannelException
   at 
 java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:167)
   at 
 java.nio.channels.SelectableChannel.register(SelectableChannel.java:254)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:331)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
 2009-03-02 21:32:07,953 INFO org.apache.hadoop.mapred.TaskRunner: Task 
 'attempt_200903022127_0001_m_003163_0' done.
 /code
 In the TaskTracker log, it looks like this:
 code
 2009-03-02 21:31:08,110 WARN org.apache.hadoop.ipc.Server: IPC Server 
 Responder, call ping(attempt_200903022127_0001_m_003163_0) from 
 127.0.0.1:56884: output error
 2009-03-02 21:31:08,111 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 10 on 50311 caught: java.nio.channels.ClosedChannelException
 at 
 sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)at 
 org.apache.hadoop.ipc.Server.channelWrite(Server.java:1195)
 at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
 at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:613)
 at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:677)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:981)
 /code
 Note that the task actually seemed to commit - it didn't get speculatively 
 executed or anything. However, the job wasn't able to continue until this one 
 task was done. Both parties seem to think the channel was closed. How does 
 the channel get closed externally? If closing it from outside is unavoidable, 
 maybe the right thing to do is to set a much lower timeout, because 1 minute 
 delay can be pretty 

[jira] [Moved] (MAPREDUCE-2450) Calls from running tasks to TaskTracker methods sometimes fail and incur a 60s timeout

2011-04-24 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan moved HADOOP-5380 to MAPREDUCE-2450:
-

Fix Version/s: (was: 0.23.0)
   0.23.0
Affects Version/s: (was: 0.23.0)
   0.23.0
  Key: MAPREDUCE-2450  (was: HADOOP-5380)
  Project: Hadoop Map/Reduce  (was: Hadoop Common)

 Calls from running tasks to TaskTracker methods sometimes fail and incur a 
 60s timeout
 --

 Key: MAPREDUCE-2450
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2450
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Matei Zaharia
 Fix For: 0.23.0

 Attachments: HADOOP-5380.Y.20.branch.patch, HADOOP-5380.patch, 
 HADOOP_5380-Y.0.20.20x.patch


 I'm seeing some map tasks in my jobs take 1 minute to commit after they 
 finish the map computation. On the map side, the output looks like this:
 code
 2009-03-02 21:30:54,384 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Cannot 
 initialize JVM Metrics with processName=MAP, sessionId= - already initialized
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 800
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 
 300
 2009-03-02 21:30:55,493 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
 239075328/298844160
 2009-03-02 21:30:55,494 INFO org.apache.hadoop.mapred.MapTask: record buffer 
 = 786432/983040
 2009-03-02 21:31:00,381 INFO org.apache.hadoop.mapred.MapTask: Starting flush 
 of map output
 2009-03-02 21:31:07,892 INFO org.apache.hadoop.mapred.MapTask: Finished spill  0
 2009-03-02 21:31:07,951 INFO org.apache.hadoop.mapred.TaskRunner: 
 Task:attempt_200903022127_0001_m_003163_0 is done. And is in the process of 
 commiting
 2009-03-02 21:32:07,949 INFO org.apache.hadoop.mapred.TaskRunner: 
 Communication exception: java.io.IOException: Call to /127.0.0.1:50311 failed 
 on local exception: java.nio.channels.ClosedChannelException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:765)
   at org.apache.hadoop.ipc.Client.call(Client.java:733)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at org.apache.hadoop.mapred.$Proxy0.ping(Unknown Source)
   at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:525)
   at java.lang.Thread.run(Thread.java:619)
 Caused by: java.nio.channels.ClosedChannelException
   at 
 java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:167)
   at 
 java.nio.channels.SelectableChannel.register(SelectableChannel.java:254)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:331)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
 2009-03-02 21:32:07,953 INFO org.apache.hadoop.mapred.TaskRunner: Task 
 'attempt_200903022127_0001_m_003163_0' done.
 /code
 In the TaskTracker log, it looks like this:
 code
 2009-03-02 21:31:08,110 WARN org.apache.hadoop.ipc.Server: IPC Server 
 Responder, call ping(attempt_200903022127_0001_m_003163_0) from 
 127.0.0.1:56884: output error
 2009-03-02 21:31:08,111 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 10 on 50311 caught: java.nio.channels.ClosedChannelException
 at 
 sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)at 
 org.apache.hadoop.ipc.Server.channelWrite(Server.java:1195)
 at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
 at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:613)
 at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:677)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:981)
 /code
 Note that the task actually seemed to commit - it didn't get speculatively 
 executed or anything. However, the job wasn't able to continue until this one 
 task was done. Both parties seem to 

[jira] [Updated] (MAPREDUCE-2450) Calls from running tasks to TaskTracker methods sometimes fail and incur a 60s timeout

2011-04-24 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2450:


Status: Open  (was: Patch Available)

 Calls from running tasks to TaskTracker methods sometimes fail and incur a 
 60s timeout
 --

 Key: MAPREDUCE-2450
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2450
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Matei Zaharia
 Fix For: 0.23.0

 Attachments: HADOOP-5380.Y.20.branch.patch, HADOOP-5380.patch, 
 HADOOP_5380-Y.0.20.20x.patch


 I'm seeing some map tasks in my jobs take 1 minute to commit after they 
 finish the map computation. On the map side, the output looks like this:
 code
 2009-03-02 21:30:54,384 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Cannot 
 initialize JVM Metrics with processName=MAP, sessionId= - already initialized
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 800
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 
 300
 2009-03-02 21:30:55,493 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
 239075328/298844160
 2009-03-02 21:30:55,494 INFO org.apache.hadoop.mapred.MapTask: record buffer 
 = 786432/983040
 2009-03-02 21:31:00,381 INFO org.apache.hadoop.mapred.MapTask: Starting flush 
 of map output
 2009-03-02 21:31:07,892 INFO org.apache.hadoop.mapred.MapTask: Finished spill  0
 2009-03-02 21:31:07,951 INFO org.apache.hadoop.mapred.TaskRunner: 
 Task:attempt_200903022127_0001_m_003163_0 is done. And is in the process of 
 commiting
 2009-03-02 21:32:07,949 INFO org.apache.hadoop.mapred.TaskRunner: 
 Communication exception: java.io.IOException: Call to /127.0.0.1:50311 failed 
 on local exception: java.nio.channels.ClosedChannelException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:765)
   at org.apache.hadoop.ipc.Client.call(Client.java:733)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at org.apache.hadoop.mapred.$Proxy0.ping(Unknown Source)
   at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:525)
   at java.lang.Thread.run(Thread.java:619)
 Caused by: java.nio.channels.ClosedChannelException
   at 
 java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:167)
   at 
 java.nio.channels.SelectableChannel.register(SelectableChannel.java:254)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:331)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
 2009-03-02 21:32:07,953 INFO org.apache.hadoop.mapred.TaskRunner: Task 
 'attempt_200903022127_0001_m_003163_0' done.
 /code
 In the TaskTracker log, it looks like this:
 code
 2009-03-02 21:31:08,110 WARN org.apache.hadoop.ipc.Server: IPC Server 
 Responder, call ping(attempt_200903022127_0001_m_003163_0) from 
 127.0.0.1:56884: output error
 2009-03-02 21:31:08,111 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 10 on 50311 caught: java.nio.channels.ClosedChannelException
 at 
 sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)at 
 org.apache.hadoop.ipc.Server.channelWrite(Server.java:1195)
 at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
 at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:613)
 at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:677)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:981)
 /code
 Note that the task actually seemed to commit - it didn't get speculatively 
 executed or anything. However, the job wasn't able to continue until this one 
 task was done. Both parties seem to think the channel was closed. How does 
 the channel get closed externally? If closing it from outside is unavoidable, 
 maybe the right thing to do is to set a much lower timeout, because 1 minute 
 delay can be pretty significant for a small job.


[jira] [Updated] (MAPREDUCE-2450) Calls from running tasks to TaskTracker methods sometimes fail and incur a 60s timeout

2011-04-24 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2450:


Status: Patch Available  (was: Open)

Moved the JIRA from Hadoop-Common to Hadoop-MapReduce. Resubmitting for hudson 
build

 Calls from running tasks to TaskTracker methods sometimes fail and incur a 
 60s timeout
 --

 Key: MAPREDUCE-2450
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2450
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Matei Zaharia
 Fix For: 0.23.0

 Attachments: HADOOP-5380.Y.20.branch.patch, HADOOP-5380.patch, 
 HADOOP_5380-Y.0.20.20x.patch


 I'm seeing some map tasks in my jobs take 1 minute to commit after they 
 finish the map computation. On the map side, the output looks like this:
 code
 2009-03-02 21:30:54,384 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Cannot 
 initialize JVM Metrics with processName=MAP, sessionId= - already initialized
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 800
 2009-03-02 21:30:54,437 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 
 300
 2009-03-02 21:30:55,493 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
 239075328/298844160
 2009-03-02 21:30:55,494 INFO org.apache.hadoop.mapred.MapTask: record buffer 
 = 786432/983040
 2009-03-02 21:31:00,381 INFO org.apache.hadoop.mapred.MapTask: Starting flush 
 of map output
 2009-03-02 21:31:07,892 INFO org.apache.hadoop.mapred.MapTask: Finished spill  0
 2009-03-02 21:31:07,951 INFO org.apache.hadoop.mapred.TaskRunner: 
 Task:attempt_200903022127_0001_m_003163_0 is done. And is in the process of 
 commiting
 2009-03-02 21:32:07,949 INFO org.apache.hadoop.mapred.TaskRunner: 
 Communication exception: java.io.IOException: Call to /127.0.0.1:50311 failed 
 on local exception: java.nio.channels.ClosedChannelException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:765)
   at org.apache.hadoop.ipc.Client.call(Client.java:733)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
   at org.apache.hadoop.mapred.$Proxy0.ping(Unknown Source)
   at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:525)
   at java.lang.Thread.run(Thread.java:619)
 Caused by: java.nio.channels.ClosedChannelException
   at 
 java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:167)
   at 
 java.nio.channels.SelectableChannel.register(SelectableChannel.java:254)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:331)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
   at 
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
   at java.io.FilterInputStream.read(FilterInputStream.java:116)
   at 
 org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
   at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
 2009-03-02 21:32:07,953 INFO org.apache.hadoop.mapred.TaskRunner: Task 
 'attempt_200903022127_0001_m_003163_0' done.
 /code
 In the TaskTracker log, it looks like this:
 code
 2009-03-02 21:31:08,110 WARN org.apache.hadoop.ipc.Server: IPC Server 
 Responder, call ping(attempt_200903022127_0001_m_003163_0) from 
 127.0.0.1:56884: output error
 2009-03-02 21:31:08,111 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 10 on 50311 caught: java.nio.channels.ClosedChannelException
 at 
 sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)at 
 org.apache.hadoop.ipc.Server.channelWrite(Server.java:1195)
 at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
 at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:613)
 at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:677)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:981)
 /code
 Note that the task actually seemed to commit - it didn't get speculatively 
 executed or anything. However, the job wasn't able to continue until this one 
 task was done. Both parties seem to think the channel was closed. How does 
 the channel get closed externally? If closing it from outside is unavoidable, 
 maybe the right thing to do is to set a 

[jira] [Updated] (MAPREDUCE-1461) Feature to instruct rumen-folder utility to skip jobs worth of specific duration

2011-04-18 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1461:


Status: Patch Available  (was: Open)

 Feature to instruct rumen-folder utility to skip jobs worth of specific 
 duration
 

 Key: MAPREDUCE-1461
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1461
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Reporter: Rajesh Balamohan
 Attachments: MR-1461-trunk.patch, mapreduce-1461--2010-02-05.patch, 
 mapreduce-1461--2010-03-04.patch


 JSON outputs of rumen on production logs can be huge in the order of multiple 
 GB. Rumen's folder utility helps in getting a smaller snapshot of this JSON 
 data.
 It would be helpful to have an option in rumen-folder, wherein user can 
 specify a duration from which rumen-folder should start processing data.
 Related JIRA link: https://issues.apache.org/jira/browse/MAPREDUCE-1295

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2153) Bring in more job configuration properties in to the trace file

2011-04-12 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2153:


Assignee: Rajesh Balamohan
  Status: Open  (was: Patch Available)

 Bring in more job configuration properties in to the trace file
 ---

 Key: MAPREDUCE-2153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2153
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
Assignee: Rajesh Balamohan
 Attachments: MR-2153-patch.txt, MapReduce-2153-trunk.patch, 
 MapReduce-2153-trunk.patch, mr-2153-test-patch-results.txt


 To emulate distributed cache usage in gridmix jobs, there are 9 configuration 
 properties needed to be available in trace file: 
 (1) mapreduce.job.cache.files
 (2) mapreduce.job.cache.files.visibilities
 (3) mapreduce.job.cache.files.filesizes
 (4) mapreduce.job.cache.files.timestamps
 (5) mapreduce.job.cache.archives
 (6) mapreduce.job.cache.archives.visibilities
 (7) mapreduce.job.cache.archives.filesizes
 (8) mapreduce.job.cache.archives.timestamps
 (9) mapreduce.job.cache.symlink.create
 To emulate data compression in gridmix jobs, trace file should contain the 
 following configuration properties:
 (1) mapreduce.map.output.compress
 (2) mapreduce.map.output.compress.codec
 (3) mapreduce.output.fileoutputformat.compress
 (4) mapreduce.output.fileoutputformat.compress.codec
 (5) mapreduce.output.fileoutputformat.compress.type
 Ideally, gridmix should set many job specific configuration properties like 
 io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same 
 effect of original/real job in terms of spilled records, number of merges, 
 etc.
 TraceBuilder should bring in all these properties into the generated trace 
 file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2153) Bring in more job configuration properties in to the trace file

2011-04-12 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2153:


Status: Patch Available  (was: Open)

 Bring in more job configuration properties in to the trace file
 ---

 Key: MAPREDUCE-2153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2153
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
Assignee: Rajesh Balamohan
 Attachments: MR-2153-patch.txt, MapReduce-2153-trunk.patch, 
 MapReduce-2153-trunk.patch, mr-2153-test-patch-results.txt


 To emulate distributed cache usage in gridmix jobs, there are 9 configuration 
 properties needed to be available in trace file: 
 (1) mapreduce.job.cache.files
 (2) mapreduce.job.cache.files.visibilities
 (3) mapreduce.job.cache.files.filesizes
 (4) mapreduce.job.cache.files.timestamps
 (5) mapreduce.job.cache.archives
 (6) mapreduce.job.cache.archives.visibilities
 (7) mapreduce.job.cache.archives.filesizes
 (8) mapreduce.job.cache.archives.timestamps
 (9) mapreduce.job.cache.symlink.create
 To emulate data compression in gridmix jobs, trace file should contain the 
 following configuration properties:
 (1) mapreduce.map.output.compress
 (2) mapreduce.map.output.compress.codec
 (3) mapreduce.output.fileoutputformat.compress
 (4) mapreduce.output.fileoutputformat.compress.codec
 (5) mapreduce.output.fileoutputformat.compress.type
 Ideally, gridmix should set many job specific configuration properties like 
 io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same 
 effect of original/real job in terms of spilled records, number of merges, 
 etc.
 TraceBuilder should bring in all these properties into the generated trace 
 file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2153) Bring in more job configuration properties in to the trace file

2011-04-12 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2153:


Attachment: MR-2153-patch.txt

Fixed the javac warnings in earlier patch

 Bring in more job configuration properties in to the trace file
 ---

 Key: MAPREDUCE-2153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2153
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
Assignee: Rajesh Balamohan
 Attachments: MR-2153-patch.txt, MapReduce-2153-trunk.patch, 
 MapReduce-2153-trunk.patch, mr-2153-test-patch-results.txt


 To emulate distributed cache usage in gridmix jobs, there are 9 configuration 
 properties needed to be available in trace file: 
 (1) mapreduce.job.cache.files
 (2) mapreduce.job.cache.files.visibilities
 (3) mapreduce.job.cache.files.filesizes
 (4) mapreduce.job.cache.files.timestamps
 (5) mapreduce.job.cache.archives
 (6) mapreduce.job.cache.archives.visibilities
 (7) mapreduce.job.cache.archives.filesizes
 (8) mapreduce.job.cache.archives.timestamps
 (9) mapreduce.job.cache.symlink.create
 To emulate data compression in gridmix jobs, trace file should contain the 
 following configuration properties:
 (1) mapreduce.map.output.compress
 (2) mapreduce.map.output.compress.codec
 (3) mapreduce.output.fileoutputformat.compress
 (4) mapreduce.output.fileoutputformat.compress.codec
 (5) mapreduce.output.fileoutputformat.compress.type
 Ideally, gridmix should set many job specific configuration properties like 
 io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same 
 effect of original/real job in terms of spilled records, number of merges, 
 etc.
 TraceBuilder should bring in all these properties into the generated trace 
 file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2153) Bring in more job configuration properties in to the trace file

2011-04-11 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2153:


Affects Version/s: 0.23.0
   Status: Patch Available  (was: Open)

 Bring in more job configuration properties in to the trace file
 ---

 Key: MAPREDUCE-2153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2153
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
 Attachments: MapReduce-2153-trunk.patch, 
 mr-2153-test-patch-results.txt


 To emulate distributed cache usage in gridmix jobs, there are 9 configuration 
 properties needed to be available in trace file: 
 (1) mapreduce.job.cache.files
 (2) mapreduce.job.cache.files.visibilities
 (3) mapreduce.job.cache.files.filesizes
 (4) mapreduce.job.cache.files.timestamps
 (5) mapreduce.job.cache.archives
 (6) mapreduce.job.cache.archives.visibilities
 (7) mapreduce.job.cache.archives.filesizes
 (8) mapreduce.job.cache.archives.timestamps
 (9) mapreduce.job.cache.symlink.create
 To emulate data compression in gridmix jobs, trace file should contain the 
 following configuration properties:
 (1) mapreduce.map.output.compress
 (2) mapreduce.map.output.compress.codec
 (3) mapreduce.output.fileoutputformat.compress
 (4) mapreduce.output.fileoutputformat.compress.codec
 (5) mapreduce.output.fileoutputformat.compress.type
 Ideally, gridmix should set many job specific configuration properties like 
 io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same 
 effect of original/real job in terms of spilled records, number of merges, 
 etc.
 TraceBuilder should bring in all these properties into the generated trace 
 file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2153) Bring in more job configuration properties in to the trace file

2011-04-11 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2153:


Attachment: MapReduce-2153-trunk.patch

Uploading the same patch for running via Hudson 

 Bring in more job configuration properties in to the trace file
 ---

 Key: MAPREDUCE-2153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2153
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
 Attachments: MapReduce-2153-trunk.patch, MapReduce-2153-trunk.patch, 
 mr-2153-test-patch-results.txt


 To emulate distributed cache usage in gridmix jobs, there are 9 configuration 
 properties needed to be available in trace file: 
 (1) mapreduce.job.cache.files
 (2) mapreduce.job.cache.files.visibilities
 (3) mapreduce.job.cache.files.filesizes
 (4) mapreduce.job.cache.files.timestamps
 (5) mapreduce.job.cache.archives
 (6) mapreduce.job.cache.archives.visibilities
 (7) mapreduce.job.cache.archives.filesizes
 (8) mapreduce.job.cache.archives.timestamps
 (9) mapreduce.job.cache.symlink.create
 To emulate data compression in gridmix jobs, trace file should contain the 
 following configuration properties:
 (1) mapreduce.map.output.compress
 (2) mapreduce.map.output.compress.codec
 (3) mapreduce.output.fileoutputformat.compress
 (4) mapreduce.output.fileoutputformat.compress.codec
 (5) mapreduce.output.fileoutputformat.compress.type
 Ideally, gridmix should set many job specific configuration properties like 
 io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same 
 effect of original/real job in terms of spilled records, number of merges, 
 etc.
 TraceBuilder should bring in all these properties into the generated trace 
 file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2153) Bring in more job configuration properties in to the trace file

2011-04-11 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2153:


Status: Patch Available  (was: Open)

 Bring in more job configuration properties in to the trace file
 ---

 Key: MAPREDUCE-2153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2153
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
 Attachments: MapReduce-2153-trunk.patch, MapReduce-2153-trunk.patch, 
 mr-2153-test-patch-results.txt


 To emulate distributed cache usage in gridmix jobs, there are 9 configuration 
 properties needed to be available in trace file: 
 (1) mapreduce.job.cache.files
 (2) mapreduce.job.cache.files.visibilities
 (3) mapreduce.job.cache.files.filesizes
 (4) mapreduce.job.cache.files.timestamps
 (5) mapreduce.job.cache.archives
 (6) mapreduce.job.cache.archives.visibilities
 (7) mapreduce.job.cache.archives.filesizes
 (8) mapreduce.job.cache.archives.timestamps
 (9) mapreduce.job.cache.symlink.create
 To emulate data compression in gridmix jobs, trace file should contain the 
 following configuration properties:
 (1) mapreduce.map.output.compress
 (2) mapreduce.map.output.compress.codec
 (3) mapreduce.output.fileoutputformat.compress
 (4) mapreduce.output.fileoutputformat.compress.codec
 (5) mapreduce.output.fileoutputformat.compress.type
 Ideally, gridmix should set many job specific configuration properties like 
 io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same 
 effect of original/real job in terms of spilled records, number of merges, 
 etc.
 TraceBuilder should bring in all these properties into the generated trace 
 file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-1461) Feature to instruct rumen-folder utility to skip jobs worth of specific duration

2011-04-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1461:


Attachment: MR-1461-trunk.patch

Regenerated the patch for latest apache trunk codebase

 Feature to instruct rumen-folder utility to skip jobs worth of specific 
 duration
 

 Key: MAPREDUCE-1461
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1461
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Reporter: Rajesh Balamohan
 Attachments: MR-1461-trunk.patch, mapreduce-1461--2010-02-05.patch, 
 mapreduce-1461--2010-03-04.patch


 JSON outputs of rumen on production logs can be huge in the order of multiple 
 GB. Rumen's folder utility helps in getting a smaller snapshot of this JSON 
 data.
 It would be helpful to have an option in rumen-folder, wherein user can 
 specify a duration from which rumen-folder should start processing data.
 Related JIRA link: https://issues.apache.org/jira/browse/MAPREDUCE-1295

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2153) Bring in more job configuration properties in to the trace file

2011-04-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2153:


Attachment: mr-2153-test-patch-results.txt

ant test-patch results

findbugs are not related to this patch.

 Bring in more job configuration properties in to the trace file
 ---

 Key: MAPREDUCE-2153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2153
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Reporter: Ravi Gummadi
 Attachments: MapReduce-2153-trunk.patch, 
 mr-2153-test-patch-results.txt


 To emulate distributed cache usage in gridmix jobs, there are 9 configuration 
 properties needed to be available in trace file: 
 (1) mapreduce.job.cache.files
 (2) mapreduce.job.cache.files.visibilities
 (3) mapreduce.job.cache.files.filesizes
 (4) mapreduce.job.cache.files.timestamps
 (5) mapreduce.job.cache.archives
 (6) mapreduce.job.cache.archives.visibilities
 (7) mapreduce.job.cache.archives.filesizes
 (8) mapreduce.job.cache.archives.timestamps
 (9) mapreduce.job.cache.symlink.create
 To emulate data compression in gridmix jobs, trace file should contain the 
 following configuration properties:
 (1) mapreduce.map.output.compress
 (2) mapreduce.map.output.compress.codec
 (3) mapreduce.output.fileoutputformat.compress
 (4) mapreduce.output.fileoutputformat.compress.codec
 (5) mapreduce.output.fileoutputformat.compress.type
 Ideally, gridmix should set many job specific configuration properties like 
 io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same 
 effect of original/real job in terms of spilled records, number of merges, 
 etc.
 TraceBuilder should bring in all these properties into the generated trace 
 file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2417) In Gridmix, in RoundRobinUserResolver mode, the testing/proxy users are not associated with unique users in a trace

2011-04-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2417:


Attachment: MR-2417-trunk.patch

 In Gridmix, in RoundRobinUserResolver mode, the testing/proxy users are not 
 associated with unique users in a trace
 ---

 Key: MAPREDUCE-2417
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2417
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Attachments: MR-2417-trunk.patch


 As per the Gridmix documentation, the testing users should associate with 
 unique user in the trace. However, currently the gridmix impersonate the 
 users based on job irrespective of user.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2153) Bring in more job configuration properties in to the trace file

2011-04-06 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-2153:


Attachment: MapReduce-2153-trunk.patch

Attaching the patch for apache trunk. This patch ensures that all job 
properties are saved in the json file under jobProperties tag.

 Bring in more job configuration properties in to the trace file
 ---

 Key: MAPREDUCE-2153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2153
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Reporter: Ravi Gummadi
 Attachments: MapReduce-2153-trunk.patch


 To emulate distributed cache usage in gridmix jobs, there are 9 configuration 
 properties needed to be available in trace file: 
 (1) mapreduce.job.cache.files
 (2) mapreduce.job.cache.files.visibilities
 (3) mapreduce.job.cache.files.filesizes
 (4) mapreduce.job.cache.files.timestamps
 (5) mapreduce.job.cache.archives
 (6) mapreduce.job.cache.archives.visibilities
 (7) mapreduce.job.cache.archives.filesizes
 (8) mapreduce.job.cache.archives.timestamps
 (9) mapreduce.job.cache.symlink.create
 To emulate data compression in gridmix jobs, trace file should contain the 
 following configuration properties:
 (1) mapreduce.map.output.compress
 (2) mapreduce.map.output.compress.codec
 (3) mapreduce.output.fileoutputformat.compress
 (4) mapreduce.output.fileoutputformat.compress.codec
 (5) mapreduce.output.fileoutputformat.compress.type
 Ideally, gridmix should set many job specific configuration properties like 
 io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same 
 effect of original/real job in terms of spilled records, number of merges, 
 etc.
 TraceBuilder should bring in all these properties into the generated trace 
 file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

2010-09-12 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908604#action_12908604
 ] 

Rajesh Balamohan commented on MAPREDUCE-1904:
-

Thanks for the review comments Arun. 

1. For #1, I would post the profiler output of which methods are expensive in 
getLocalPathToRead().

2. For #2, the code path for LocalDirAllocator.confChanged() need not be called 
in this context of TaskTracker. 

Reason: In this context, TaskTracker is trying to check for any config changes 
related to  mapred.local.dir using LocalDirAllocator. Once its read, this 
parameter does not change over TaskTracker's lifetime. Hence, it is not 
mandatory to do this check for every invocation. Corner case: When tasktracker 
goes down and new configs are reloaded, the LRUCache would also be repopulated. 
 



 Reducing locking contention in TaskTracker.MapOutputServlet's 
 LocalDirAllocator
 ---

 Key: MAPREDUCE-1904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan
 Attachments: MAPREDUCE-1904-RC10.patch, MAPREDUCE-1904-trunk.patch, 
 profiler output after applying the patch.jpg, TaskTracker- yourkit profiler 
 output .jpg, Thread profiler output showing contention.jpg


 While profiling tasktracker with Sort benchmark, it was observed that threads 
 block on LocalDirAllocator.getLocalPathToRead() in order to get the index 
 file and temporary map output file.
 As LocalDirAllocator is tied up with ServetContext,  only one instance would 
 be available per tasktracker httpserver.  Given the jobid  mapid, 
 LocalDirAllocator retrieves index file path and temporary map output file 
 path. getLocalPathToRead() is internally synchronized.
 Introducing a LRUCache for this lookup reduces the contention heavily 
 (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the 
 LRUCache can be varied based on the environment and I observed a throughput 
 improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

2010-07-20 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1904:


Attachment: MAPREDUCE-1904-trunk.patch

Attaching the patch for trunk version. 

 Reducing locking contention in TaskTracker.MapOutputServlet's 
 LocalDirAllocator
 ---

 Key: MAPREDUCE-1904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan
 Attachments: MAPREDUCE-1904-RC10.patch, MAPREDUCE-1904-trunk.patch, 
 profiler output after applying the patch.jpg, TaskTracker- yourkit profiler 
 output .jpg, Thread profiler output showing contention.jpg


 While profiling tasktracker with Sort benchmark, it was observed that threads 
 block on LocalDirAllocator.getLocalPathToRead() in order to get the index 
 file and temporary map output file.
 As LocalDirAllocator is tied up with ServetContext,  only one instance would 
 be available per tasktracker httpserver.  Given the jobid  mapid, 
 LocalDirAllocator retrieves index file path and temporary map output file 
 path. getLocalPathToRead() is internally synchronized.
 Introducing a LRUCache for this lookup reduces the contention heavily 
 (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the 
 LRUCache can be varied based on the environment and I observed a throughput 
 improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

2010-07-09 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1904:


Attachment: TaskTracker- yourkit profiler output .jpg

LocalDirAllocator.AllocatorPerContext is heavily contended. 

 Reducing locking contention in TaskTracker.MapOutputServlet's 
 LocalDirAllocator
 ---

 Key: MAPREDUCE-1904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan
 Attachments: MAPREDUCE-1904-RC10.patch, TaskTracker- yourkit profiler 
 output .jpg


 While profiling tasktracker with Sort benchmark, it was observed that threads 
 block on LocalDirAllocator.getLocalPathToRead() in order to get the index 
 file and temporary map output file.
 As LocalDirAllocator is tied up with ServetContext,  only one instance would 
 be available per tasktracker httpserver.  Given the jobid  mapid, 
 LocalDirAllocator retrieves index file path and temporary map output file 
 path. getLocalPathToRead() is internally synchronized.
 Introducing a LRUCache for this lookup reduces the contention heavily 
 (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the 
 LRUCache can be varied based on the environment and I observed a throughput 
 improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

2010-07-09 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1904:


Attachment: profiler output after applying the patch.jpg

Contention on LocalDirAllocator is very less. Close to 0%

 Reducing locking contention in TaskTracker.MapOutputServlet's 
 LocalDirAllocator
 ---

 Key: MAPREDUCE-1904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan
 Attachments: MAPREDUCE-1904-RC10.patch, profiler output after 
 applying the patch.jpg, TaskTracker- yourkit profiler output .jpg, Thread 
 profiler output showing contention.jpg


 While profiling tasktracker with Sort benchmark, it was observed that threads 
 block on LocalDirAllocator.getLocalPathToRead() in order to get the index 
 file and temporary map output file.
 As LocalDirAllocator is tied up with ServetContext,  only one instance would 
 be available per tasktracker httpserver.  Given the jobid  mapid, 
 LocalDirAllocator retrieves index file path and temporary map output file 
 path. getLocalPathToRead() is internally synchronized.
 Introducing a LRUCache for this lookup reduces the contention heavily 
 (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the 
 LRUCache can be varied based on the environment and I observed a throughput 
 improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

2010-07-09 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1904:


Attachment: Thread profiler output showing contention.jpg

 Reducing locking contention in TaskTracker.MapOutputServlet's 
 LocalDirAllocator
 ---

 Key: MAPREDUCE-1904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan
 Attachments: MAPREDUCE-1904-RC10.patch, profiler output after 
 applying the patch.jpg, TaskTracker- yourkit profiler output .jpg, Thread 
 profiler output showing contention.jpg


 While profiling tasktracker with Sort benchmark, it was observed that threads 
 block on LocalDirAllocator.getLocalPathToRead() in order to get the index 
 file and temporary map output file.
 As LocalDirAllocator is tied up with ServetContext,  only one instance would 
 be available per tasktracker httpserver.  Given the jobid  mapid, 
 LocalDirAllocator retrieves index file path and temporary map output file 
 path. getLocalPathToRead() is internally synchronized.
 Introducing a LRUCache for this lookup reduces the contention heavily 
 (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the 
 LRUCache can be varied based on the environment and I observed a throughput 
 improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

2010-07-05 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1904:


Attachment: MAPREDUCE-1904-RC10.patch

Patch for RC10 release is attached here.

 Reducing locking contention in TaskTracker.MapOutputServlet's 
 LocalDirAllocator
 ---

 Key: MAPREDUCE-1904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan
 Attachments: MAPREDUCE-1904-RC10.patch


 While profiling tasktracker with Sort benchmark, it was observed that threads 
 block on LocalDirAllocator.getLocalPathToRead() in order to get the index 
 file and temporary map output file.
 As LocalDirAllocator is tied up with ServetContext,  only one instance would 
 be available per tasktracker httpserver.  Given the jobid  mapid, 
 LocalDirAllocator retrieves index file path and temporary map output file 
 path. getLocalPathToRead() is internally synchronized.
 Introducing a LRUCache for this lookup reduces the contention heavily 
 (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the 
 LRUCache can be varied based on the environment and I observed a throughput 
 improvement in the order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

2010-06-30 Thread Rajesh Balamohan (JIRA)
Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
---

 Key: MAPREDUCE-1904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan


While profiling tasktracker with Sort benchmark, it was observed that threads 
block on LocalDirAllocator.getLocalPathToRead() in order to get the index file 
and temporary map output file.

As LocalDirAllocator is tied up with ServetContext,  only one instance would be 
available per tasktracker httpserver.  Given the jobid  mapid, 
LocalDirAllocator retrieves index file path and temporary map output file path. 
getLocalPathToRead() is internally synchronized.

Introducing a LRUCache for this lookup reduces the contention heavily (LRUCache 
with key =jobid +mapid and value=PATH to the file). Size of the LRUCache can be 
varied based on the environment and I observed a throughput improvement in the 
order of 4-7% with the introduction of LRUCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1533) Reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()

2010-04-27 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861311#action_12861311
 ] 

Rajesh Balamohan commented on MAPREDUCE-1533:
-

I am on vacation from 23-Apr - 17-May. I do not have internet access during 
this time.  Please check with my manager sriguru@ for any urgent issues.

~Rajesh.B



 Reduce or remove usage of String.format() usage in 
 CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()
 --

 Key: MAPREDUCE-1533
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan
Assignee: Amar Kamat
 Attachments: MAPREDUCE-1533-and-others-20100413.1.txt, 
 MAPREDUCE-1533-and-others-20100413.bugfix.txt, mapreduce-1533-v1.4.patch


 When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT 
 executes heartBeat() method heavily. This internally makes a call to 
 CapacityTaskScheduler.updateQSIObjects(). 
 CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() 
 for setting the job scheduling information. Based on the datastructure size 
 of jobQueuesManager and queueInfoMap, the number of times String.format() 
 gets executed becomes very high. String.format() internally does pattern 
 matching which turns to be out very heavy (This was revealed while profiling 
 JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of 
 which String.format() took 46%.
 Would it be possible to do String.format() only at the time of invoking 
 JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while 
 processing heartbeats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1461) Feature to instruct rumen-folder utility to skip jobs worth of specific duration

2010-03-03 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1461:


Attachment: mapreduce-1461--2010-03-04.patch

I took the trunk version and generated the patch. Please refer the attached 
file.

 Feature to instruct rumen-folder utility to skip jobs worth of specific 
 duration
 

 Key: MAPREDUCE-1461
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1461
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Rajesh Balamohan
 Fix For: 0.22.0

 Attachments: mapreduce-1461--2010-02-05.patch, 
 mapreduce-1461--2010-03-04.patch


 JSON outputs of rumen on production logs can be huge in the order of multiple 
 GB. Rumen's folder utility helps in getting a smaller snapshot of this JSON 
 data.
 It would be helpful to have an option in rumen-folder, wherein user can 
 specify a duration from which rumen-folder should start processing data.
 Related JIRA link: https://issues.apache.org/jira/browse/MAPREDUCE-1295

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1533) reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects

2010-02-24 Thread Rajesh Balamohan (JIRA)
reduce or remove usage of String.format() usage in 
CapacityTaskScheduler.updateQSIObjects
-

 Key: MAPREDUCE-1533
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan


When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT 
executes heartBeat() method heavily. This internally makes a call to 
CapacityTaskScheduler.updateQSIObjects(). 

CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() for 
setting the job scheduling information. Based on the datastructure size of 
jobQueuesManager and queueInfoMap, the number of times String.format() gets 
executed becomes very high. String.format() internally does pattern matching 
which turns to be out very heavy (This was revealed while profiling JT. Almost 
57% of time was spent in CapacityScheduler.assignTasks(), out of which 
String.format() took 46%.

Would it be possible to do String.format() only at the time of invoking 
JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while 
processing heartbeats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1354) Refactor JobTracker.submitJob to not lock the JobTracker during the HDFS accesses

2010-02-16 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834136#action_12834136
 ] 

Rajesh Balamohan commented on MAPREDUCE-1354:
-

In the latest patch, getTaskCompletionEvents is using synchronized(this) I 
believe.

It has to be on synchronized (jobs) {

public TaskCompletionEvent[] getTaskCompletionEvents(
  JobID jobid, int fromEventId, int maxEvents) throws IOException{

 JobInProgress job = null;
 synchronized (jobs) {
  job = this.jobs.get(jobid);
 }
  if (null != job) {
return isJobInited(job) ?
job.getTaskCompletionEvents(fromEventId, maxEvents) :
TaskCompletionEvent.EMPTY_ARRAY;
  }
  return completedJobStatusStore.readJobTaskCompletionEvents(jobid, 
fromEventId, maxEvents);
  }

 Refactor JobTracker.submitJob to not lock the JobTracker during the HDFS 
 accesses
 -

 Key: MAPREDUCE-1354
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Devaraj Das
Assignee: Arun C Murthy
Priority: Critical
 Attachments: MAPREDUCE-1354_yhadoop20.patch, 
 MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
 MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
 MAPREDUCE-1354_yhadoop20.patch


 It'd be nice to have the JobTracker object not be locked while accessing the 
 HDFS for reading the jobconf file and while writing the jobinfo file in the 
 submitJob method. We should see if we can avoid taking the lock altogether.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1354) Refactor JobTracker.submitJob to not lock the JobTracker during the HDFS accesses

2010-02-16 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834149#action_12834149
 ] 

Rajesh Balamohan commented on MAPREDUCE-1354:
-

Plz ignore the previous comment. Had a discussion with Hemanth and will try out 
synchronized(jobs) on trunk.

 Refactor JobTracker.submitJob to not lock the JobTracker during the HDFS 
 accesses
 -

 Key: MAPREDUCE-1354
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Devaraj Das
Assignee: Arun C Murthy
Priority: Critical
 Attachments: MAPREDUCE-1354_yhadoop20.patch, 
 MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
 MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
 MAPREDUCE-1354_yhadoop20.patch


 It'd be nice to have the JobTracker object not be locked while accessing the 
 HDFS for reading the jobconf file and while writing the jobinfo file in the 
 submitJob method. We should see if we can avoid taking the lock altogether.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1495) Reduce locking contention on JobTracker.getTaskCompletionEvents()

2010-02-15 Thread Rajesh Balamohan (JIRA)
Reduce locking contention on JobTracker.getTaskCompletionEvents()
-

 Key: MAPREDUCE-1495
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1495
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan


While profiling JT for slow performance with small-jobs, it was observed that 
JobTracker.getTaskCompletionEvents() is attributing to 40% of lock contention 
on JT.

This JIRA ticket is created to explore the possibilities of reducing the 
sychronized code block in this method. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1495) Reduce locking contention on JobTracker.getTaskCompletionEvents()

2010-02-15 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833834#action_12833834
 ] 

Rajesh Balamohan commented on MAPREDUCE-1495:
-

As of now, its implemented as follows in JobTracker

public synchronized TaskCompletionEvent[] getTaskCompletionEvents(
  JobID jobid, int fromEventId, int maxEvents) throws IOException{
synchronized (this) {
  JobInProgress job = this.jobs.get(jobid);
  if (null != job) {
if (job.inited()) {
  return job.getTaskCompletionEvents(fromEventId, maxEvents);
} else {
  return EMPTY_EVENTS;
}
  }
}
return completedJobStatusStore.readJobTaskCompletionEvents(jobid, 
fromEventId, maxEvents);
  }

Where, jobs is TreeMapJobID, JobInProgress(). 

It is possible to reduce to contention in 2 ways.

1. Reduce the synch to only JobInProgress job = this.jobs.get(jobid); Rest of 
the code is independent of the synch block (asaik).
2. Change datastructure of jobs to ConcurrentHashMapJobID, JobInProgress(). 
This way, we can jobs.get(jobid) automatically becomes threadsafe and the total 
synchornization itself can be eliminated. If its mandatory to maintain the 
order, I have to try the 1st  one.  

 Reduce locking contention on JobTracker.getTaskCompletionEvents()
 -

 Key: MAPREDUCE-1495
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1495
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan

 While profiling JT for slow performance with small-jobs, it was observed that 
 JobTracker.getTaskCompletionEvents() is attributing to 40% of lock contention 
 on JT.
 This JIRA ticket is created to explore the possibilities of reducing the 
 sychronized code block in this method. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1495) Reduce locking contention on JobTracker.getTaskCompletionEvents()

2010-02-15 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834030#action_12834030
 ] 

Rajesh Balamohan commented on MAPREDUCE-1495:
-

Made the following changes to reduce contentions and profiled JT. This 
virtually eliminated the 
contentions caused by getTaskCompletionEvents 

In JobTracker.java: (locking only jobs during get())

  public TaskCompletionEvent[] getTaskCompletionEvents(
  JobID jobid, int fromEventId, int maxEvents) throws IOException{
JobInProgress job = null;
synchronized (jobs) {
  job = this.jobs.get(jobid);
}
if (null != job) {
  if (job.inited()) {
return job.getTaskCompletionEvents(fromEventId, maxEvents);
  } else {
return EMPTY_EVENTS;
  }
}
return completedJobStatusStore.readJobTaskCompletionEvents(jobid, 
fromEventId, maxEvents);
  }
  
In Configuration.java: (eliminated synchronize on  method level. This might be 
required only when properties is null)
  private Properties getProps() {
if (properties == null) {
  synchronized(this) {
properties = new Properties();
loadResources(properties, resources, quietmode);
if (overlay!= null) {
  properties.putAll(overlay);
  if (storeResource) {
for (Map.EntryObject,Object item: overlay.entrySet()) {
  updatingResource.put((String) item.getKey(), Unknown);
}
  }
}
  }
}
return properties;
  }

 Reduce locking contention on JobTracker.getTaskCompletionEvents()
 -

 Key: MAPREDUCE-1495
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1495
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.1
Reporter: Rajesh Balamohan

 While profiling JT for slow performance with small-jobs, it was observed that 
 JobTracker.getTaskCompletionEvents() is attributing to 40% of lock contention 
 on JT.
 This JIRA ticket is created to explore the possibilities of reducing the 
 sychronized code block in this method. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1461) Feature to instruct rumen-folder utility to skip jobs worth of specific duration

2010-02-04 Thread Rajesh Balamohan (JIRA)
Feature to instruct rumen-folder utility to skip jobs worth of specific duration


 Key: MAPREDUCE-1461
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1461
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Rajesh Balamohan
 Fix For: 0.22.0


JSON outputs of rumen on production logs can be huge in the order of multiple 
GB. Rumen's folder utility helps in getting a smaller snapshot of this JSON 
data.
It would be helpful to have an option in rumen-folder, wherein user can specify 
a duration from which rumen-folder should start processing data.

Related JIRA link: https://issues.apache.org/jira/browse/MAPREDUCE-1295

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1461) Feature to instruct rumen-folder utility to skip jobs worth of specific duration

2010-02-04 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated MAPREDUCE-1461:


Attachment: mapreduce-1461--2010-02-05.patch

The attached patch implements this feature. User can specify  the time duration 
to be skipped by specifying -starts-after commandline argument.

 Feature to instruct rumen-folder utility to skip jobs worth of specific 
 duration
 

 Key: MAPREDUCE-1461
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1461
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Rajesh Balamohan
 Fix For: 0.22.0

 Attachments: mapreduce-1461--2010-02-05.patch


 JSON outputs of rumen on production logs can be huge in the order of multiple 
 GB. Rumen's folder utility helps in getting a smaller snapshot of this JSON 
 data.
 It would be helpful to have an option in rumen-folder, wherein user can 
 specify a duration from which rumen-folder should start processing data.
 Related JIRA link: https://issues.apache.org/jira/browse/MAPREDUCE-1295

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.