[jira] [Updated] (MAPREDUCE-7004) Uploader tool for Distributed Cache Implementation on Windows

2017-11-09 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated MAPREDUCE-7004:
--
Description: The proposal is to create a tool that collects all available 
jars in the Hadoop classpath and adds them to a single tarball file. It then 
uploads the resulting archive to an HDFS directory. This saves the cluster 
administrator from having to set this up manually for Distributed Cache Deploy. 
This jira is about applying the tool code to Windows.

> Uploader tool for Distributed Cache Implementation on Windows
> -
>
> Key: MAPREDUCE-7004
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7004
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Miklos Szegedi
>
> The proposal is to create a tool that collects all available jars in the 
> Hadoop classpath and adds them to a single tarball file. It then uploads the 
> resulting archive to an HDFS directory. This saves the cluster administrator 
> from having to set this up manually for Distributed Cache Deploy. This jira 
> is about applying the tool code to Windows.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7004) Uploader tool for Distributed Cache Implementation on Windows

2017-11-09 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created MAPREDUCE-7004:
-

 Summary: Uploader tool for Distributed Cache Implementation on 
Windows
 Key: MAPREDUCE-7004
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7004
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Miklos Szegedi






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6994) Uploader tool for Distributed Cache Deploy code changes

2017-11-09 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246638#comment-16246638
 ] 

Miklos Szegedi commented on MAPREDUCE-6994:
---

Thank you for the review [~rkanter].
3. LOG and print have different meanings. LOG can be adjusted, even turned off, 
I use print to return the output of the script.

> Uploader tool for Distributed Cache Deploy code changes
> ---
>
> Key: MAPREDUCE-6994
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6994
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: MAPREDUCE-6994.000.patch, MAPREDUCE-6994.001.patch, 
> MAPREDUCE-6994.002.patch, MAPREDUCE-6994.003.patch, MAPREDUCE-6994.004.patch, 
> MAPREDUCE-6994.005.patch
>
>
> The proposal is to create a tool that collects all available jars in the 
> Hadoop classpath and adds them to a single tarball file. It then uploads the 
> resulting archive to an HDFS directory. This saves the cluster administrator 
> from having to set this up manually for Distributed Cache Deploy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6994) Uploader tool for Distributed Cache Deploy code changes

2017-11-09 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246377#comment-16246377
 ] 

Robert Kanter commented on MAPREDUCE-6994:
--

Thanks for working on this [~miklos.szeg...@cloudera.com] and [~yufeigu] for 
reviews so far.  

Looks good overall.  Here's some more comments:
# The new pom is missing the {{}} and {{}} elements for itself.  
Take a look at a sibling pom like {{hadoop-mapreduce-client-shuffle}} for an 
example.
# The new pom should declare all direct dependencies.  I'm actually curious how 
it's compiling with no dependencies defined...
# Why are we using a mix of {{LOG}} and {{System.out.print}}?  We should use 
{{LOG}}.
# In {{FrameworkUploader#buildPackage}}, instead of using our own {{buffer}} 
with a {{while}} loop, we can just use {{IOUtils#copy}} from commons-io.
# In {{FrameworkUploader#buildPackage}}, we can use try-with-resources for 
{{out}} and {{inputStream}}
# In {{FrameworkUploader}}, instead of manually formatting and printing out the 
Help text, you can use the {{HelpFormatter}} from cli-commons along with the 
{{Options}} you're already using.  You can see an example of this in other CLI 
tools, such as {{HadoopArchiveLogs}}.
# In {{TestFrameworkUploader}}, we should move {{RANDOM}} and {{TEST_DIR}} into 
an {{@Before}} {{setUp}} method.  
# Please file a followup JIRA for Windows support

> Uploader tool for Distributed Cache Deploy code changes
> ---
>
> Key: MAPREDUCE-6994
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6994
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: MAPREDUCE-6994.000.patch, MAPREDUCE-6994.001.patch, 
> MAPREDUCE-6994.002.patch, MAPREDUCE-6994.003.patch, MAPREDUCE-6994.004.patch, 
> MAPREDUCE-6994.005.patch
>
>
> The proposal is to create a tool that collects all available jars in the 
> Hadoop classpath and adds them to a single tarball file. It then uploads the 
> resulting archive to an HDFS directory. This saves the cluster administrator 
> from having to set this up manually for Distributed Cache Deploy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6994) Uploader tool for Distributed Cache Deploy code changes

2017-11-09 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246188#comment-16246188
 ] 

Yufei Gu commented on MAPREDUCE-6994:
-

Thanks [~miklos.szeg...@cloudera.com]. The last patch looks good to me. +1. 

> Uploader tool for Distributed Cache Deploy code changes
> ---
>
> Key: MAPREDUCE-6994
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6994
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: MAPREDUCE-6994.000.patch, MAPREDUCE-6994.001.patch, 
> MAPREDUCE-6994.002.patch, MAPREDUCE-6994.003.patch, MAPREDUCE-6994.004.patch, 
> MAPREDUCE-6994.005.patch
>
>
> The proposal is to create a tool that collects all available jars in the 
> Hadoop classpath and adds them to a single tarball file. It then uploads the 
> resulting archive to an HDFS directory. This saves the cluster administrator 
> from having to set this up manually for Distributed Cache Deploy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

2017-11-09 Thread Peter Bacsko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246125#comment-16246125
 ] 

Peter Bacsko commented on MAPREDUCE-5124:
-

I uploaded POC v2. This should be free of race conditions. Plus, we only 
dispatch update events to the queue if it's necessary.

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-5124-CoalescingPOC-1.patch, 
> MAPREDUCE-5124-CoalescingPOC2.patch, MAPREDUCE-5124-proto.2.txt, 
> MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-5124) AM lacks flow control for task events

2017-11-09 Thread Peter Bacsko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated MAPREDUCE-5124:

Attachment: MAPREDUCE-5124-CoalescingPOC2.patch

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-5124-CoalescingPOC-1.patch, 
> MAPREDUCE-5124-CoalescingPOC2.patch, MAPREDUCE-5124-proto.2.txt, 
> MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

2017-11-09 Thread Peter Bacsko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245754#comment-16245754
 ] 

Peter Bacsko commented on MAPREDUCE-5124:
-

Hm, I believe there's a race condition in my previous patch.

1. Task attempt invokes setNextStatusUpdate() via RPC
2. This results in attempt.needStatusUpdate = true
3. However, an update for the same attempt is already running, setting 
needStatusUpdate = false at the end
4. New event is queued
5. New event is taken from the queue, updater logic runs
6. Updater logic sees that needStatusUpdate = false -- an update is lost

I'll re-think this and update the patch.

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-5124-CoalescingPOC-1.patch, 
> MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org