[jira] [Updated] (MAPREDUCE-7004) Uploader tool for Distributed Cache Implementation on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-7004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated MAPREDUCE-7004: -- Description: The proposal is to create a tool that collects all available jars in the Hadoop classpath and adds them to a single tarball file. It then uploads the resulting archive to an HDFS directory. This saves the cluster administrator from having to set this up manually for Distributed Cache Deploy. This jira is about applying the tool code to Windows. > Uploader tool for Distributed Cache Implementation on Windows > - > > Key: MAPREDUCE-7004 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7004 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Miklos Szegedi > > The proposal is to create a tool that collects all available jars in the > Hadoop classpath and adds them to a single tarball file. It then uploads the > resulting archive to an HDFS directory. This saves the cluster administrator > from having to set this up manually for Distributed Cache Deploy. This jira > is about applying the tool code to Windows. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7004) Uploader tool for Distributed Cache Implementation on Windows
Miklos Szegedi created MAPREDUCE-7004: - Summary: Uploader tool for Distributed Cache Implementation on Windows Key: MAPREDUCE-7004 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7004 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Miklos Szegedi -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6994) Uploader tool for Distributed Cache Deploy code changes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246638#comment-16246638 ] Miklos Szegedi commented on MAPREDUCE-6994: --- Thank you for the review [~rkanter]. 3. LOG and print have different meanings. LOG can be adjusted, even turned off, I use print to return the output of the script. > Uploader tool for Distributed Cache Deploy code changes > --- > > Key: MAPREDUCE-6994 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6994 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi > Attachments: MAPREDUCE-6994.000.patch, MAPREDUCE-6994.001.patch, > MAPREDUCE-6994.002.patch, MAPREDUCE-6994.003.patch, MAPREDUCE-6994.004.patch, > MAPREDUCE-6994.005.patch > > > The proposal is to create a tool that collects all available jars in the > Hadoop classpath and adds them to a single tarball file. It then uploads the > resulting archive to an HDFS directory. This saves the cluster administrator > from having to set this up manually for Distributed Cache Deploy. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6994) Uploader tool for Distributed Cache Deploy code changes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246377#comment-16246377 ] Robert Kanter commented on MAPREDUCE-6994: -- Thanks for working on this [~miklos.szeg...@cloudera.com] and [~yufeigu] for reviews so far. Looks good overall. Here's some more comments: # The new pom is missing the {{}} and {{}} elements for itself. Take a look at a sibling pom like {{hadoop-mapreduce-client-shuffle}} for an example. # The new pom should declare all direct dependencies. I'm actually curious how it's compiling with no dependencies defined... # Why are we using a mix of {{LOG}} and {{System.out.print}}? We should use {{LOG}}. # In {{FrameworkUploader#buildPackage}}, instead of using our own {{buffer}} with a {{while}} loop, we can just use {{IOUtils#copy}} from commons-io. # In {{FrameworkUploader#buildPackage}}, we can use try-with-resources for {{out}} and {{inputStream}} # In {{FrameworkUploader}}, instead of manually formatting and printing out the Help text, you can use the {{HelpFormatter}} from cli-commons along with the {{Options}} you're already using. You can see an example of this in other CLI tools, such as {{HadoopArchiveLogs}}. # In {{TestFrameworkUploader}}, we should move {{RANDOM}} and {{TEST_DIR}} into an {{@Before}} {{setUp}} method. # Please file a followup JIRA for Windows support > Uploader tool for Distributed Cache Deploy code changes > --- > > Key: MAPREDUCE-6994 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6994 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi > Attachments: MAPREDUCE-6994.000.patch, MAPREDUCE-6994.001.patch, > MAPREDUCE-6994.002.patch, MAPREDUCE-6994.003.patch, MAPREDUCE-6994.004.patch, > MAPREDUCE-6994.005.patch > > > The proposal is to create a tool that collects all available jars in the > Hadoop classpath and adds them to a single tarball file. It then uploads the > resulting archive to an HDFS directory. This saves the cluster administrator > from having to set this up manually for Distributed Cache Deploy. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6994) Uploader tool for Distributed Cache Deploy code changes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246188#comment-16246188 ] Yufei Gu commented on MAPREDUCE-6994: - Thanks [~miklos.szeg...@cloudera.com]. The last patch looks good to me. +1. > Uploader tool for Distributed Cache Deploy code changes > --- > > Key: MAPREDUCE-6994 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6994 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi > Attachments: MAPREDUCE-6994.000.patch, MAPREDUCE-6994.001.patch, > MAPREDUCE-6994.002.patch, MAPREDUCE-6994.003.patch, MAPREDUCE-6994.004.patch, > MAPREDUCE-6994.005.patch > > > The proposal is to create a tool that collects all available jars in the > Hadoop classpath and adds them to a single tarball file. It then uploads the > resulting archive to an HDFS directory. This saves the cluster administrator > from having to set this up manually for Distributed Cache Deploy. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events
[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246125#comment-16246125 ] Peter Bacsko commented on MAPREDUCE-5124: - I uploaded POC v2. This should be free of race conditions. Plus, we only dispatch update events to the queue if it's necessary. > AM lacks flow control for task events > - > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Peter Bacsko > Attachments: MAPREDUCE-5124-CoalescingPOC-1.patch, > MAPREDUCE-5124-CoalescingPOC2.patch, MAPREDUCE-5124-proto.2.txt, > MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-5124) AM lacks flow control for task events
[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated MAPREDUCE-5124: Attachment: MAPREDUCE-5124-CoalescingPOC2.patch > AM lacks flow control for task events > - > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Peter Bacsko > Attachments: MAPREDUCE-5124-CoalescingPOC-1.patch, > MAPREDUCE-5124-CoalescingPOC2.patch, MAPREDUCE-5124-proto.2.txt, > MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events
[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245754#comment-16245754 ] Peter Bacsko commented on MAPREDUCE-5124: - Hm, I believe there's a race condition in my previous patch. 1. Task attempt invokes setNextStatusUpdate() via RPC 2. This results in attempt.needStatusUpdate = true 3. However, an update for the same attempt is already running, setting needStatusUpdate = false at the end 4. New event is queued 5. New event is taken from the queue, updater logic runs 6. Updater logic sees that needStatusUpdate = false -- an update is lost I'll re-think this and update the patch. > AM lacks flow control for task events > - > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Peter Bacsko > Attachments: MAPREDUCE-5124-CoalescingPOC-1.patch, > MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org