Re: Hadoop MapReduce High Availability

2013-04-29 Thread Sandy Ryza
Hi Augusto,

In Hadoop 2, ResourceManager HA is being worked on under YARN-128 and
YARN-149.  There's a design doc for RM recovery on the latter.

Hadoop 1's MapReduce high availability story is kind of fragmented.
 Cloudera distribution has JobTracker HA based on the HA libraries
available in Hadoop 2.  I believe other distributions like Hortonworks' and
MapR's also have JobTracker HA solutions.  For a variety of reasons, none
of these are likely to make it into the Apache releases.

-Sandy

On Sun, Apr 28, 2013 at 2:52 PM, Augusto Souza augustorso...@gmail.comwrote:

 Hello,

 Sorry if this topic has already been discussed, but I am new to this
 mailing list and didn't find a way to check for past messages.

 Let me introduce myself. My name is Augusto Souza and I am a MSc
 student in Distributed Systems in University of Campinas (Brazil). One
 of the possibilities I have been thinking for developing my research
 is the problem of MapReduce High Availability.

 There are some open issues in Jira for this topic for quite a long time:
 https://issues.apache.org/jira/browse/MAPREDUCE-2288
 https://issues.apache.org/jira/browse/MAPREDUCE-225

 I also found some blog posts about this topic (eg:

 http://hortonworks.com/blog/high-availability-and-hadoop-1-0-perfect-together/
 ),
 but I didn't find one global and official solution from the community,
 is there one? Is there a way I could contribute with this? Are there
 some resources you guys recommend me to read about this topic?

 Thanks in advance.

 Best regards,
 Augusto Souza



Re: problem in switching to branch-2

2013-04-29 Thread Vinod Kumar Vavilapalli

branch-1.2 has lib/conf/logs etc.

branch-2's directory structure and the content is similar to trunk, so you may 
already be seeing correct code.

See pom.xml and look for the first occurrence of version, you should see 
3.0.0-SNAPSHOT for trunk and 2.0.5-SNAPSHOT for branch-2.

Thanks,
+Vinod

On Apr 29, 2013, at 3:55 AM, Samaneh Shokuhi wrote:

 Hi Vinod,
 I did what you said but the thing is my working directory still contains
 trunk branch !! what i see for example as branch-1.2 is different from
 branch-2 . I switch to branch-1.2 like  git checkout
 remotes/origin/branch-1.2 and it contains for example build ,bin ,conf,
 lib, source ,logs directories .. but  when i switched to branch-2 it still
 shows trunk  . I tried with what you suggested git checkout -b branch-2
 origin/branch-2 .a new branch called branch-2 is added but still contains
 trunk 
 
 I am wondering why git checkout remotes/origin/branch-1.2 switched to
 branch-1.2 but git checkout remotes/origin/branch-2 not switching to
 branch-2 
 
 If you try with git checkout -b branch-2 origin/branch-2 , do you see
 branch-2 contents in your working directory ?
 
 Samaneh
 
 
 On Mon, Apr 29, 2013 at 2:10 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:
 
 
 Right, you have to do git checkout -b branch-2 origin/branch-2. You get
 the trunk branch by default, for all other remote branches, you have to
 create local branches like that.
 
 HTH
 
 Thanks,
 +Vinod Kumar Vavilapalli
 Hortonworks Inc.
 http://hortonworks.com/
 
 
 On Apr 28, 2013, at 12:42 PM, Samaneh Shokuhi wrote:
 
 Hi Vinod,
 i tried with the path you sent but no success .I mean is it not really
 switched to branch-2 ,and calling the git branch -a displays no
 branch
 .Working directory  still contains trunk not branch-2 contents.  what i
 get
 in console is  :
 
 You are in 'detached HEAD' state. You can look around, make experimental
 changes and commit them, and you can discard any commits you make in this
 state without impacting any branches by performing another checkout.
 
 If you want to create a new branch to retain commits you create, you may
 do so (now or later) by using -b with the checkout command again.
 Example:
 
 git checkout -b new_branch_name
 
 HEAD is now at 72713e0... HDFS-4748. Merge r1476587 from trunk.
 
 
 
 On Sun, Apr 28, 2013 at 9:20 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:
 
 
 Not sure about how updated the repo at github is, I use git://
 git.apache.org/hadoop-common.git
 
 Also what do you mean by branch switch is not working?
 
 Thanks,
 +Vinod
 
 On Apr 28, 2013, at 11:45 AM, Samaneh Shokuhi wrote:
 
 Hi All,
 i ve got a clone of hadoop and tried to switch to branch-2 but ,it is
 not
 working ,while switching to other branches like branch-1.2 is possible.
 Any
 idea why cant switch to branch-2 ?
 
 That’s what i ve done :
 
 $ git clone git://github.com/apache/hadoop-common.git hadoop-1
 $ cd hadoop-1
 $ git checkout remotes/origin/branch-2
 
 
 Samaneh
 
 
 
 



Re: Build failed for Hadoop-1.0.4

2013-04-29 Thread Harsh J
1. To built 1.0.4 specifically, you will also need Cygwin installed
and on your Windows PATH. I'd suggest instead using branch-1-win (no
Apache releases yet though) from the source repository.
2. You need autoconf, automake, cmake, etc. installed for building a
fully configured, native-libs including tarball. I doubt if even with
Cygwin that'd work with 1.0.4 so you can rather try to run a simple
ant jar instead, or specifically disable native lib building.

On Sat, Apr 27, 2013 at 12:53 AM, Thoihen Maibam thoihen...@gmail.com wrote:
 Hi All,

 Can anybody help me in resolving the build error, below is the error I got

 BUILD FAILED
 F:\HADOOP COMMIT 1.1.4\Hadoop-1.0.4\build.xml:618: Execute failed:
 java.io.IOException: Cannot run program autoreconf (in directory
 F:\HADOOP COMMIT 1.1.4\Hadoop-1.0.4\src\native): CreateProcess error=2,
 The system cannot find the file specified.

 1. Downloaded Hadoop-1.0.4 in windows eclipse.
 2. Configured ant in eclipse and copied ivy.jar in ant/lib file

 Goal: I just wanted to build Hadoop in windows environment just to go
 through the code and familiarize myself with the Hadoop code base, would
 install cygwin later  on and run. Initially, I build one of the Hadoop
 version(I don't remember which one but was successful in building with ant
 but now I got this error, file missing.

 Regards
 thoihen


 Regards
 thoihen



-- 
Harsh J


Re: problem in switching to branch-2

2013-04-29 Thread Samaneh Shokuhi
Hi Vinod ,
I want to run WordCount example with branch-2 and what i need to do that is
hadoop-core-x.x.x-SNAPSHOT .
in branch-1 ,we have build.xml which i can generate hadoop-core jar file by
executing  ant jar ,but here in branch-2 i don’t know how to generate
hadoop-core jar file. And also i may need to modify hadoop source code and
test it with WordCount example .
Could you please tell me how to generate hadoop-core jar file in branch-2
,it was straight forward in branch-1 by  executing ant on build.xml .

Samaneh


On Mon, Apr 29, 2013 at 7:11 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:


 branch-1.2 has lib/conf/logs etc.

 branch-2's directory structure and the content is similar to trunk, so you
 may already be seeing correct code.

 See pom.xml and look for the first occurrence of version, you should see
 3.0.0-SNAPSHOT for trunk and 2.0.5-SNAPSHOT for branch-2.

 Thanks,
 +Vinod

 On Apr 29, 2013, at 3:55 AM, Samaneh Shokuhi wrote:

  Hi Vinod,
  I did what you said but the thing is my working directory still contains
  trunk branch !! what i see for example as branch-1.2 is different from
  branch-2 . I switch to branch-1.2 like  git checkout
  remotes/origin/branch-1.2 and it contains for example build ,bin ,conf,
  lib, source ,logs directories .. but  when i switched to branch-2 it
 still
  shows trunk  . I tried with what you suggested git checkout -b branch-2
  origin/branch-2 .a new branch called branch-2 is added but still
 contains
  trunk 
 
  I am wondering why git checkout remotes/origin/branch-1.2 switched to
  branch-1.2 but git checkout remotes/origin/branch-2 not switching to
  branch-2 
 
  If you try with git checkout -b branch-2 origin/branch-2 , do you see
  branch-2 contents in your working directory ?
 
  Samaneh
 
 
  On Mon, Apr 29, 2013 at 2:10 AM, Vinod Kumar Vavilapalli 
  vino...@hortonworks.com wrote:
 
 
  Right, you have to do git checkout -b branch-2 origin/branch-2. You
 get
  the trunk branch by default, for all other remote branches, you have to
  create local branches like that.
 
  HTH
 
  Thanks,
  +Vinod Kumar Vavilapalli
  Hortonworks Inc.
  http://hortonworks.com/
 
 
  On Apr 28, 2013, at 12:42 PM, Samaneh Shokuhi wrote:
 
  Hi Vinod,
  i tried with the path you sent but no success .I mean is it not really
  switched to branch-2 ,and calling the git branch -a displays no
  branch
  .Working directory  still contains trunk not branch-2 contents.  what i
  get
  in console is  :
 
  You are in 'detached HEAD' state. You can look around, make
 experimental
  changes and commit them, and you can discard any commits you make in
 this
  state without impacting any branches by performing another checkout.
 
  If you want to create a new branch to retain commits you create, you
 may
  do so (now or later) by using -b with the checkout command again.
  Example:
 
  git checkout -b new_branch_name
 
  HEAD is now at 72713e0... HDFS-4748. Merge r1476587 from trunk.
 
 
 
  On Sun, Apr 28, 2013 at 9:20 PM, Vinod Kumar Vavilapalli 
  vino...@hortonworks.com wrote:
 
 
  Not sure about how updated the repo at github is, I use git://
  git.apache.org/hadoop-common.git
 
  Also what do you mean by branch switch is not working?
 
  Thanks,
  +Vinod
 
  On Apr 28, 2013, at 11:45 AM, Samaneh Shokuhi wrote:
 
  Hi All,
  i ve got a clone of hadoop and tried to switch to branch-2 but ,it is
  not
  working ,while switching to other branches like branch-1.2 is
 possible.
  Any
  idea why cant switch to branch-2 ?
 
  That’s what i ve done :
 
  $ git clone git://github.com/apache/hadoop-common.git hadoop-1
  $ cd hadoop-1
  $ git checkout remotes/origin/branch-2
 
 
  Samaneh
 
 
 
 




[jira] [Created] (MAPREDUCE-5192) Separate TCE resolution from fetch

2013-04-29 Thread Chris Douglas (JIRA)
Chris Douglas created MAPREDUCE-5192:


 Summary: Separate TCE resolution from fetch
 Key: MAPREDUCE-5192
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5192
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: task
Reporter: Chris Douglas
Priority: Minor


The {{EventFetcher}} thread grounds task completion events as URIs before 
passing them to the {{ShuffleScheduler}}. If the former deferred this to the 
scheduler, one could interpret the TCE metadata differently

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5134) Default settings cause LocalJobRunner to OOME

2013-04-29 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza resolved MAPREDUCE-5134.
---

Resolution: Not A Problem

 Default settings cause LocalJobRunner to OOME
 -

 Key: MAPREDUCE-5134
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5134
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 If I run a job using the local job runner with vanilla settings, I get an out 
 of memory error.  This seems to be because the default client memory maximum 
 is 128 MB, and the default io.sort.mb is 100 MB.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Heads up - 2.0.5-beta

2013-04-29 Thread Roman Shaposhnik
On Fri, Apr 26, 2013 at 11:15 AM, Arun C Murthy a...@hortonworks.com wrote:

 On Apr 25, 2013, at 7:31 PM, Roman Shaposhnik wrote:

 On Thu, Apr 25, 2013 at 6:34 PM, Arun C Murthy a...@hortonworks.com wrote:

 With that in mind, I really want to make a serious push to lock down APIs 
 and wire-protocols for hadoop-2.0.5-beta.
 Thus, we can confidently support hadoop-2.x in a compatible manner in the 
 future. So, it's fine to add new features,
 but please ensure that all APIs are frozen for hadoop-2.0.5-beta

 Arun, since it sounds like you have a pretty definite idea
 in mind for what you want 'beta' label to actually mean,
 could you, please, share the exact criteria?

 Sorry, I'm not sure if this is exactly what you are looking for but, as I 
 mentioned above, the primary aim would be make the final set of required 
 API/write-protocol changes so that we can call it a 'beta' i.e. once 
 2.0.5-beta ships users  downstream projects can be confident about forward 
 compatibility in hadoop-2.x line. Obviously, we might discover a blocker bug 
 post 2.0.5 which *might* necessitate an unfortunate change - but that should 
 be an outstanding exception.

 Hope that helps.

It does make things a bit easier, but here's what I'd like to find
out what *level* of feedback from downstream components
and DevOps community would be considered adequate for a
release to be called beta.

IOW, would it make sense for us as a community, to make
the following things as part of the release criteria as far
as downstream components are concerned:
   * producing Maven artifacts of downstream components
     against branch-2 artifacts.
   * having unit test jobs for all the downstream components
     against branch-2 artifacts
   * having all the failures in those unit tests triaged and filed
     either against a component itself or hadoop
   * running Bigtop integration tests on branch-2 nightly
   * having all the failures of unit tests triaged and filed
     either against components or hadoop

Obviously, quantifying DevOps feedback and involvement
is more difficult, but would it be completely out of the question
to, essentially, predicate beta on some level of feedback
coming from Yahoo!/LI/FB/etc?

Thanks,
Roman.

P.S. Note that most of those things Bigtop can help with -- so lets
not get hung up on resources too much for now -- but rather on
whether we'd want those to be part of the release criteria
IF we had all the resources.


[jira] [Created] (MAPREDUCE-5193) A few MR tests use block sizes which are smaller than the default minimum block size

2013-04-29 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created MAPREDUCE-5193:
-

 Summary: A few MR tests use block sizes which are smaller than the 
default minimum block size
 Key: MAPREDUCE-5193
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5193
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.5-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


HDFS-4305 introduced a new configurable minimum block size of 1MB. A few MR 
tests deliberately set much smaller block sizes. This JIRA is to update those 
tests to fix these failing tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira