Re: Hadoop MapReduce High Availability
Hi Augusto, In Hadoop 2, ResourceManager HA is being worked on under YARN-128 and YARN-149. There's a design doc for RM recovery on the latter. Hadoop 1's MapReduce high availability story is kind of fragmented. Cloudera distribution has JobTracker HA based on the HA libraries available in Hadoop 2. I believe other distributions like Hortonworks' and MapR's also have JobTracker HA solutions. For a variety of reasons, none of these are likely to make it into the Apache releases. -Sandy On Sun, Apr 28, 2013 at 2:52 PM, Augusto Souza augustorso...@gmail.comwrote: Hello, Sorry if this topic has already been discussed, but I am new to this mailing list and didn't find a way to check for past messages. Let me introduce myself. My name is Augusto Souza and I am a MSc student in Distributed Systems in University of Campinas (Brazil). One of the possibilities I have been thinking for developing my research is the problem of MapReduce High Availability. There are some open issues in Jira for this topic for quite a long time: https://issues.apache.org/jira/browse/MAPREDUCE-2288 https://issues.apache.org/jira/browse/MAPREDUCE-225 I also found some blog posts about this topic (eg: http://hortonworks.com/blog/high-availability-and-hadoop-1-0-perfect-together/ ), but I didn't find one global and official solution from the community, is there one? Is there a way I could contribute with this? Are there some resources you guys recommend me to read about this topic? Thanks in advance. Best regards, Augusto Souza
Re: problem in switching to branch-2
branch-1.2 has lib/conf/logs etc. branch-2's directory structure and the content is similar to trunk, so you may already be seeing correct code. See pom.xml and look for the first occurrence of version, you should see 3.0.0-SNAPSHOT for trunk and 2.0.5-SNAPSHOT for branch-2. Thanks, +Vinod On Apr 29, 2013, at 3:55 AM, Samaneh Shokuhi wrote: Hi Vinod, I did what you said but the thing is my working directory still contains trunk branch !! what i see for example as branch-1.2 is different from branch-2 . I switch to branch-1.2 like git checkout remotes/origin/branch-1.2 and it contains for example build ,bin ,conf, lib, source ,logs directories .. but when i switched to branch-2 it still shows trunk . I tried with what you suggested git checkout -b branch-2 origin/branch-2 .a new branch called branch-2 is added but still contains trunk I am wondering why git checkout remotes/origin/branch-1.2 switched to branch-1.2 but git checkout remotes/origin/branch-2 not switching to branch-2 If you try with git checkout -b branch-2 origin/branch-2 , do you see branch-2 contents in your working directory ? Samaneh On Mon, Apr 29, 2013 at 2:10 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Right, you have to do git checkout -b branch-2 origin/branch-2. You get the trunk branch by default, for all other remote branches, you have to create local branches like that. HTH Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Apr 28, 2013, at 12:42 PM, Samaneh Shokuhi wrote: Hi Vinod, i tried with the path you sent but no success .I mean is it not really switched to branch-2 ,and calling the git branch -a displays no branch .Working directory still contains trunk not branch-2 contents. what i get in console is : You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b new_branch_name HEAD is now at 72713e0... HDFS-4748. Merge r1476587 from trunk. On Sun, Apr 28, 2013 at 9:20 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Not sure about how updated the repo at github is, I use git:// git.apache.org/hadoop-common.git Also what do you mean by branch switch is not working? Thanks, +Vinod On Apr 28, 2013, at 11:45 AM, Samaneh Shokuhi wrote: Hi All, i ve got a clone of hadoop and tried to switch to branch-2 but ,it is not working ,while switching to other branches like branch-1.2 is possible. Any idea why cant switch to branch-2 ? That’s what i ve done : $ git clone git://github.com/apache/hadoop-common.git hadoop-1 $ cd hadoop-1 $ git checkout remotes/origin/branch-2 Samaneh
Re: Build failed for Hadoop-1.0.4
1. To built 1.0.4 specifically, you will also need Cygwin installed and on your Windows PATH. I'd suggest instead using branch-1-win (no Apache releases yet though) from the source repository. 2. You need autoconf, automake, cmake, etc. installed for building a fully configured, native-libs including tarball. I doubt if even with Cygwin that'd work with 1.0.4 so you can rather try to run a simple ant jar instead, or specifically disable native lib building. On Sat, Apr 27, 2013 at 12:53 AM, Thoihen Maibam thoihen...@gmail.com wrote: Hi All, Can anybody help me in resolving the build error, below is the error I got BUILD FAILED F:\HADOOP COMMIT 1.1.4\Hadoop-1.0.4\build.xml:618: Execute failed: java.io.IOException: Cannot run program autoreconf (in directory F:\HADOOP COMMIT 1.1.4\Hadoop-1.0.4\src\native): CreateProcess error=2, The system cannot find the file specified. 1. Downloaded Hadoop-1.0.4 in windows eclipse. 2. Configured ant in eclipse and copied ivy.jar in ant/lib file Goal: I just wanted to build Hadoop in windows environment just to go through the code and familiarize myself with the Hadoop code base, would install cygwin later on and run. Initially, I build one of the Hadoop version(I don't remember which one but was successful in building with ant but now I got this error, file missing. Regards thoihen Regards thoihen -- Harsh J
Re: problem in switching to branch-2
Hi Vinod , I want to run WordCount example with branch-2 and what i need to do that is hadoop-core-x.x.x-SNAPSHOT . in branch-1 ,we have build.xml which i can generate hadoop-core jar file by executing ant jar ,but here in branch-2 i don’t know how to generate hadoop-core jar file. And also i may need to modify hadoop source code and test it with WordCount example . Could you please tell me how to generate hadoop-core jar file in branch-2 ,it was straight forward in branch-1 by executing ant on build.xml . Samaneh On Mon, Apr 29, 2013 at 7:11 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: branch-1.2 has lib/conf/logs etc. branch-2's directory structure and the content is similar to trunk, so you may already be seeing correct code. See pom.xml and look for the first occurrence of version, you should see 3.0.0-SNAPSHOT for trunk and 2.0.5-SNAPSHOT for branch-2. Thanks, +Vinod On Apr 29, 2013, at 3:55 AM, Samaneh Shokuhi wrote: Hi Vinod, I did what you said but the thing is my working directory still contains trunk branch !! what i see for example as branch-1.2 is different from branch-2 . I switch to branch-1.2 like git checkout remotes/origin/branch-1.2 and it contains for example build ,bin ,conf, lib, source ,logs directories .. but when i switched to branch-2 it still shows trunk . I tried with what you suggested git checkout -b branch-2 origin/branch-2 .a new branch called branch-2 is added but still contains trunk I am wondering why git checkout remotes/origin/branch-1.2 switched to branch-1.2 but git checkout remotes/origin/branch-2 not switching to branch-2 If you try with git checkout -b branch-2 origin/branch-2 , do you see branch-2 contents in your working directory ? Samaneh On Mon, Apr 29, 2013 at 2:10 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Right, you have to do git checkout -b branch-2 origin/branch-2. You get the trunk branch by default, for all other remote branches, you have to create local branches like that. HTH Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Apr 28, 2013, at 12:42 PM, Samaneh Shokuhi wrote: Hi Vinod, i tried with the path you sent but no success .I mean is it not really switched to branch-2 ,and calling the git branch -a displays no branch .Working directory still contains trunk not branch-2 contents. what i get in console is : You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b new_branch_name HEAD is now at 72713e0... HDFS-4748. Merge r1476587 from trunk. On Sun, Apr 28, 2013 at 9:20 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Not sure about how updated the repo at github is, I use git:// git.apache.org/hadoop-common.git Also what do you mean by branch switch is not working? Thanks, +Vinod On Apr 28, 2013, at 11:45 AM, Samaneh Shokuhi wrote: Hi All, i ve got a clone of hadoop and tried to switch to branch-2 but ,it is not working ,while switching to other branches like branch-1.2 is possible. Any idea why cant switch to branch-2 ? That’s what i ve done : $ git clone git://github.com/apache/hadoop-common.git hadoop-1 $ cd hadoop-1 $ git checkout remotes/origin/branch-2 Samaneh
[jira] [Created] (MAPREDUCE-5192) Separate TCE resolution from fetch
Chris Douglas created MAPREDUCE-5192: Summary: Separate TCE resolution from fetch Key: MAPREDUCE-5192 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5192 Project: Hadoop Map/Reduce Issue Type: Task Components: task Reporter: Chris Douglas Priority: Minor The {{EventFetcher}} thread grounds task completion events as URIs before passing them to the {{ShuffleScheduler}}. If the former deferred this to the scheduler, one could interpret the TCE metadata differently -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5134) Default settings cause LocalJobRunner to OOME
[ https://issues.apache.org/jira/browse/MAPREDUCE-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved MAPREDUCE-5134. --- Resolution: Not A Problem Default settings cause LocalJobRunner to OOME - Key: MAPREDUCE-5134 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5134 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza If I run a job using the local job runner with vanilla settings, I get an out of memory error. This seems to be because the default client memory maximum is 128 MB, and the default io.sort.mb is 100 MB. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Heads up - 2.0.5-beta
On Fri, Apr 26, 2013 at 11:15 AM, Arun C Murthy a...@hortonworks.com wrote: On Apr 25, 2013, at 7:31 PM, Roman Shaposhnik wrote: On Thu, Apr 25, 2013 at 6:34 PM, Arun C Murthy a...@hortonworks.com wrote: With that in mind, I really want to make a serious push to lock down APIs and wire-protocols for hadoop-2.0.5-beta. Thus, we can confidently support hadoop-2.x in a compatible manner in the future. So, it's fine to add new features, but please ensure that all APIs are frozen for hadoop-2.0.5-beta Arun, since it sounds like you have a pretty definite idea in mind for what you want 'beta' label to actually mean, could you, please, share the exact criteria? Sorry, I'm not sure if this is exactly what you are looking for but, as I mentioned above, the primary aim would be make the final set of required API/write-protocol changes so that we can call it a 'beta' i.e. once 2.0.5-beta ships users downstream projects can be confident about forward compatibility in hadoop-2.x line. Obviously, we might discover a blocker bug post 2.0.5 which *might* necessitate an unfortunate change - but that should be an outstanding exception. Hope that helps. It does make things a bit easier, but here's what I'd like to find out what *level* of feedback from downstream components and DevOps community would be considered adequate for a release to be called beta. IOW, would it make sense for us as a community, to make the following things as part of the release criteria as far as downstream components are concerned: * producing Maven artifacts of downstream components against branch-2 artifacts. * having unit test jobs for all the downstream components against branch-2 artifacts * having all the failures in those unit tests triaged and filed either against a component itself or hadoop * running Bigtop integration tests on branch-2 nightly * having all the failures of unit tests triaged and filed either against components or hadoop Obviously, quantifying DevOps feedback and involvement is more difficult, but would it be completely out of the question to, essentially, predicate beta on some level of feedback coming from Yahoo!/LI/FB/etc? Thanks, Roman. P.S. Note that most of those things Bigtop can help with -- so lets not get hung up on resources too much for now -- but rather on whether we'd want those to be part of the release criteria IF we had all the resources.
[jira] [Created] (MAPREDUCE-5193) A few MR tests use block sizes which are smaller than the default minimum block size
Aaron T. Myers created MAPREDUCE-5193: - Summary: A few MR tests use block sizes which are smaller than the default minimum block size Key: MAPREDUCE-5193 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5193 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.5-beta Reporter: Aaron T. Myers Assignee: Aaron T. Myers HDFS-4305 introduced a new configurable minimum block size of 1MB. A few MR tests deliberately set much smaller block sizes. This JIRA is to update those tests to fix these failing tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira