Re: Jiras need review

2011-07-12 Thread Josh Wills
Please add me (jwills) to that query, which should grab MR-2630 and
MR-2641-44.

On Tue, Jul 12, 2011 at 8:13 AM, Nathan Roberts nrobe...@yahoo-inc.comwrote:

 I sent this query to Suresh yesterday. He is supposed to be driving this
 from Hortonworks side, if this list doesn't look correct, let me know.

 project in (MAPREDUCE,HDFS,HADOOP) AND assignee in
 (tgraves,revans2,jeagles,sherri_chen,naisbitt,daryn,davet,sseth,eepayne,johnvijoe,kihwal,nroberts,anupamseth,raviprak)
 AND status = Patch Available


 On 7/12/11 10:09 AM, Thomas Graves tgra...@yahoo-inc.com wrote:

 I just talked to Mahadev on IM and he said we should send the list of Jiras
 we need reviewed to him and Arun.

 Lets make a list. Please respond with the jiras you need reviewed to this
 email and I'll forward them the list.

 Tom




Re: Jiras need review

2011-07-12 Thread Josh Wills
Thanks Mahadev!

On Tue, Jul 12, 2011 at 10:41 AM, Mahadev Konar maha...@hortonworks.comwrote:

 Hi Josh,
 Ill review the jira's by EOD today. We should clean up our PA's
 (currently we have 102 of them) and get it down to
 reasonable/maintainable/searchable number.

 thanks
 mahadev

 On Tue, Jul 12, 2011 at 9:13 AM, Josh Wills jwi...@cloudera.com wrote:
  Please add me (jwills) to that query, which should grab MR-2630 and
  MR-2641-44.
 
  On Tue, Jul 12, 2011 at 8:13 AM, Nathan Roberts nrobe...@yahoo-inc.com
 wrote:
 
  I sent this query to Suresh yesterday. He is supposed to be driving this
  from Hortonworks side, if this list doesn't look correct, let me know.
 
  project in (MAPREDUCE,HDFS,HADOOP) AND assignee in
 
 (tgraves,revans2,jeagles,sherri_chen,naisbitt,daryn,davet,sseth,eepayne,johnvijoe,kihwal,nroberts,anupamseth,raviprak)
  AND status = Patch Available
 
 
  On 7/12/11 10:09 AM, Thomas Graves tgra...@yahoo-inc.com wrote:
 
  I just talked to Mahadev on IM and he said we should send the list of
 Jiras
  we need reviewed to him and Arun.
 
  Lets make a list. Please respond with the jiras you need reviewed to
 this
  email and I'll forward them the list.
 
  Tom
 
 
 



Re: Jiras need review

2011-07-12 Thread Josh Wills
FWIW, I'd still like my JIRAs reviewed. :)

On Tue, Jul 12, 2011 at 11:30 AM, Thomas Graves tgra...@yahoo-inc.comwrote:

 Apologies to all - this was my screw up, this went to the wrong mailing
 list.

 Please disregard this request.

 Tom


 On 7/12/11 1:22 PM, Allen Wittenauer a...@apache.org wrote:

 
  On Jul 12, 2011, at 8:09 AM, Thomas Graves wrote:
 
  I just talked to Mahadev on IM and he said we should send the list of
 Jiras
  we need reviewed to him and Arun.
 
  Lets make a list. Please respond with the jiras you need reviewed to
 this
  email and I¹ll forward them the list.
 
 
  I'm going to query jira and send you guys every single patch this is in
 patch
  available then, since this wasn't qualified as what exactly you were
 looking
  for...




Re: Problem while running eclipse-files for Next Gen Mapreduce branch

2011-07-08 Thread Josh Wills
You want to generate them using mvn instead.  See the mapreduce/yarn/README
file for how to do it.

On Fri, Jul 8, 2011 at 7:00 AM, Devaraj K devara...@huawei.com wrote:

 Hi,



   I am getting this below errors when I try to generate eclipse files using
 eclipse-files target. Can anybody help me?





 Buildfile: D:\svn\nextgenmapreduce\mapreduce\build.xml

 ivy-download:
  [get] Getting:
 http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
 http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
  [get] To: D:\svn\nextgenmapreduce\mapreduce\ivy\ivy-2.2.0.jar
  [get] Not modified - so not downloaded

 ivy-init-dirs:

 ivy-probe-antlib:

 ivy-init-antlib:

 ivy-init:
 [ivy:configure] :: Ivy non official version -  ::
 http://ant.apache.org/ivy/ http://ant.apache.org/ivy/ ::
 [ivy:configure] :: loading settings :: file =
 D:\svn\nextgenmapreduce\mapreduce\ivy\ivysettings.xml

 ivy-resolve-common:
 [ivy:resolve]
 [ivy:resolve] :: problems summary ::
 [ivy:resolve]  WARNINGS
 [ivy:resolve] module not found:
 org.apache.hadoop#yarn-server-common;1.0-SNAPSHOT
 [ivy:resolve]  apache-snapshot: tried
 [ivy:resolve]
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/had
 oop/yarn-server-common/1.0-SNAPSHOT/yarn-server-common-1.0-SNAPSHOT.pom

 https://repository.apache.org/content/repositories/snapshots/org/apache/hado
 op/yarn-server-common/1.0-SNAPSHOT/yarn-server-common-1.0-SNAPSHOT.pom
 [ivy:resolve]   -- artifact
 org.apache.hadoop#yarn-server-common;1.0-SNAPSHOT!yarn-server-common.jar:
 [ivy:resolve]
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/had
 oop/yarn-server-common/1.0-SNAPSHOT/yarn-server-common-1.0-SNAPSHOT.jar

 https://repository.apache.org/content/repositories/snapshots/org/apache/hado
 op/yarn-server-common/1.0-SNAPSHOT/yarn-server-common-1.0-SNAPSHOT.jar
 [ivy:resolve]  maven2: tried
 [ivy:resolve]
 
 http://repo1.maven.org/maven2/org/apache/hadoop/yarn-server-common/1.0-SNAP
 SHOT/yarn-server-common-1.0-SNAPSHOT.pom

 http://repo1.maven.org/maven2/org/apache/hadoop/yarn-server-common/1.0-SNAPS
 HOT/yarn-server-common-1.0-SNAPSHOT.pom
 [ivy:resolve]   -- artifact
 org.apache.hadoop#yarn-server-common;1.0-SNAPSHOT!yarn-server-common.jar:
 [ivy:resolve]
 
 http://repo1.maven.org/maven2/org/apache/hadoop/yarn-server-common/1.0-SNAP
 SHOT/yarn-server-common-1.0-SNAPSHOT.jar

 http://repo1.maven.org/maven2/org/apache/hadoop/yarn-server-common/1.0-SNAPS
 HOT/yarn-server-common-1.0-SNAPSHOT.jar
 [ivy:resolve] module not found:
 org.apache.hadoop#hadoop-mapreduce-client-core;1.0-SNAPSHOT
 [ivy:resolve]  apache-snapshot: tried
 [ivy:resolve]
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/had

 oop/hadoop-mapreduce-client-core/1.0-SNAPSHOT/hadoop-mapreduce-client-core-1
 .0-SNAPSHOT.pom

 https://repository.apache.org/content/repositories/snapshots/org/apache/hado

 op/hadoop-mapreduce-client-core/1.0-SNAPSHOT/hadoop-mapreduce-client-core-1.
 0-SNAPSHOT.pom
 [ivy:resolve]   -- artifact

 org.apache.hadoop#hadoop-mapreduce-client-core;1.0-SNAPSHOT!hadoop-mapreduce
 -client-core.jar:
 [ivy:resolve]
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/had

 oop/hadoop-mapreduce-client-core/1.0-SNAPSHOT/hadoop-mapreduce-client-core-1
 .0-SNAPSHOT.jar

 https://repository.apache.org/content/repositories/snapshots/org/apache/hado

 op/hadoop-mapreduce-client-core/1.0-SNAPSHOT/hadoop-mapreduce-client-core-1.
 0-SNAPSHOT.jar
 [ivy:resolve]  maven2: tried
 [ivy:resolve]
 
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-cor
 e/1.0-SNAPSHOT/hadoop-mapreduce-client-core-1.0-SNAPSHOT.pom

 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core
 /1.0-SNAPSHOT/hadoop-mapreduce-client-core-1.0-SNAPSHOT.pom
 [ivy:resolve]   -- artifact

 org.apache.hadoop#hadoop-mapreduce-client-core;1.0-SNAPSHOT!hadoop-mapreduce
 -client-core.jar:
 [ivy:resolve]
 
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-cor
 e/1.0-SNAPSHOT/hadoop-mapreduce-client-core-1.0-SNAPSHOT.jar

 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core
 /1.0-SNAPSHOT/hadoop-mapreduce-client-core-1.0-SNAPSHOT.jar
 [ivy:resolve] module not found:
 org.apache.hadoop#yarn-common;1.0-SNAPSHOT
 [ivy:resolve]  apache-snapshot: tried
 [ivy:resolve]
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/had
 oop/yarn-common/1.0-SNAPSHOT/yarn-common-1.0-SNAPSHOT.pom

 https://repository.apache.org/content/repositories/snapshots/org/apache/hado
 op/yarn-common/1.0-SNAPSHOT/yarn-common-1.0-SNAPSHOT.pom
 [ivy:resolve]   -- artifact
 org.apache.hadoop#yarn-common;1.0-SNAPSHOT!yarn-common.jar:
 [ivy:resolve]
 
 

Re: Problem while running eclipse-files for Next Gen Mapreduce branch

2011-07-08 Thread Josh Wills
Hey Bobby-

Vinod and I did a cleanup pass over INSTALL and yarn/README in
https://issues.apache.org/jira/browse/MAPREDUCE-2645. Please take a look and
check if there's anything else we need to add/update.

Josh

On Fri, Jul 8, 2011 at 7:25 AM, Robert Evans ev...@yahoo-inc.com wrote:

 I mapreduce/INSTALL also has some important information in it, and be aware
 that you do not have to install the avro plugin any more.  Maven can
 download it and install it automatically now, but the README was never
 updated.  Also be sure to install protocol buffers.  The build will fail
 without it.

 --Bobby

 On 7/8/11 9:04 AM, Josh Wills jwi...@cloudera.com wrote:

 You want to generate them using mvn instead.  See the mapreduce/yarn/README
 file for how to do it.

 On Fri, Jul 8, 2011 at 7:00 AM, Devaraj K devara...@huawei.com wrote:

  Hi,
 
 
 
I am getting this below errors when I try to generate eclipse files
 using
  eclipse-files target. Can anybody help me?
 
 
 
 
 
  Buildfile: D:\svn\nextgenmapreduce\mapreduce\build.xml
 
  ivy-download:
   [get] Getting:
  http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
  http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
   [get] To: D:\svn\nextgenmapreduce\mapreduce\ivy\ivy-2.2.0.jar
   [get] Not modified - so not downloaded
 
  ivy-init-dirs:
 
  ivy-probe-antlib:
 
  ivy-init-antlib:
 
  ivy-init:
  [ivy:configure] :: Ivy non official version -  ::
  http://ant.apache.org/ivy/ http://ant.apache.org/ivy/ ::
  [ivy:configure] :: loading settings :: file =
  D:\svn\nextgenmapreduce\mapreduce\ivy\ivysettings.xml
 
  ivy-resolve-common:
  [ivy:resolve]
  [ivy:resolve] :: problems summary ::
  [ivy:resolve]  WARNINGS
  [ivy:resolve] module not found:
  org.apache.hadoop#yarn-server-common;1.0-SNAPSHOT
  [ivy:resolve]  apache-snapshot: tried
  [ivy:resolve]
  
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/had
  oop/yarn-server-common/1.0-SNAPSHOT/yarn-server-common-1.0-SNAPSHOT.pom
 
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hado
  op/yarn-server-common/1.0-SNAPSHOT/yarn-server-common-1.0-SNAPSHOT.pom
  [ivy:resolve]   -- artifact
  org.apache.hadoop#yarn-server-common;1.0-SNAPSHOT!yarn-server-common.jar:
  [ivy:resolve]
  
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/had
  oop/yarn-server-common/1.0-SNAPSHOT/yarn-server-common-1.0-SNAPSHOT.jar
 
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hado
  op/yarn-server-common/1.0-SNAPSHOT/yarn-server-common-1.0-SNAPSHOT.jar
  [ivy:resolve]  maven2: tried
  [ivy:resolve]
  
 
 http://repo1.maven.org/maven2/org/apache/hadoop/yarn-server-common/1.0-SNAP
  SHOT/yarn-server-common-1.0-SNAPSHOT.pom
 
 
 http://repo1.maven.org/maven2/org/apache/hadoop/yarn-server-common/1.0-SNAPS
  HOT/yarn-server-common-1.0-SNAPSHOT.pom
  [ivy:resolve]   -- artifact
  org.apache.hadoop#yarn-server-common;1.0-SNAPSHOT!yarn-server-common.jar:
  [ivy:resolve]
  
 
 http://repo1.maven.org/maven2/org/apache/hadoop/yarn-server-common/1.0-SNAP
  SHOT/yarn-server-common-1.0-SNAPSHOT.jar
 
 
 http://repo1.maven.org/maven2/org/apache/hadoop/yarn-server-common/1.0-SNAPS
  HOT/yarn-server-common-1.0-SNAPSHOT.jar
  [ivy:resolve] module not found:
  org.apache.hadoop#hadoop-mapreduce-client-core;1.0-SNAPSHOT
  [ivy:resolve]  apache-snapshot: tried
  [ivy:resolve]
  
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/had
 
 
 oop/hadoop-mapreduce-client-core/1.0-SNAPSHOT/hadoop-mapreduce-client-core-1
  .0-SNAPSHOT.pom
 
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hado
 
 
 op/hadoop-mapreduce-client-core/1.0-SNAPSHOT/hadoop-mapreduce-client-core-1.
  0-SNAPSHOT.pom
  [ivy:resolve]   -- artifact
 
 
 org.apache.hadoop#hadoop-mapreduce-client-core;1.0-SNAPSHOT!hadoop-mapreduce
  -client-core.jar:
  [ivy:resolve]
  
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/had
 
 
 oop/hadoop-mapreduce-client-core/1.0-SNAPSHOT/hadoop-mapreduce-client-core-1
  .0-SNAPSHOT.jar
 
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hado
 
 
 op/hadoop-mapreduce-client-core/1.0-SNAPSHOT/hadoop-mapreduce-client-core-1.
  0-SNAPSHOT.jar
  [ivy:resolve]  maven2: tried
  [ivy:resolve]
  
 
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-cor
  e/1.0-SNAPSHOT/hadoop-mapreduce-client-core-1.0-SNAPSHOT.pom
 
 
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core
  /1.0-SNAPSHOT/hadoop-mapreduce-client-core-1.0-SNAPSHOT.pom
  [ivy:resolve]   -- artifact
 
 
 org.apache.hadoop#hadoop-mapreduce-client-core;1.0-SNAPSHOT!hadoop-mapreduce
  -client-core.jar:
  [ivy:resolve]
  
 
 http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-cor
  e

Re: Problem while running eclipse-files for Next Gen Mapreduce branch

2011-07-08 Thread Josh Wills
Haoyuan,

Did you specify HADOOP_CONF_DIR, YARN_CONF_DIR, etc. as per Step 8 of the
INSTALL doc? I've been running my mrv2 cluster in pseudo-distributed mode on
my linux box- happy to send you my configs if you'd like to play. There are
also a couple of changes you need to make to the yarn config files to get
real MR running, as Bobby alluded to in his email.

To run the example in the INSTALL file, you'll need to patch in this change:
https://issues.apache.org/jira/browse/MAPREDUCE-2644

Josh

On Fri, Jul 8, 2011 at 8:07 AM, Haoyuan Li haoyuan...@gmail.com wrote:

 Hi Josh,

 Which configuration file could tell Yarn the location of HDFS? This could
 be
 found on

 http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/mapreduce/INSTALL

 Thank you.

 Best,

 Haoyuan

 On Fri, Jul 8, 2011 at 10:58 PM, Josh Wills jwi...@cloudera.com wrote:

  Hey Bobby-
 
  Vinod and I did a cleanup pass over INSTALL and yarn/README in
  https://issues.apache.org/jira/browse/MAPREDUCE-2645. Please take a look
  and
  check if there's anything else we need to add/update.
 
  Josh
 
  On Fri, Jul 8, 2011 at 7:25 AM, Robert Evans ev...@yahoo-inc.com
 wrote:
 
   I mapreduce/INSTALL also has some important information in it, and be
  aware
   that you do not have to install the avro plugin any more.  Maven can
   download it and install it automatically now, but the README was never
   updated.  Also be sure to install protocol buffers.  The build will
 fail
   without it.
  
   --Bobby
  
   On 7/8/11 9:04 AM, Josh Wills jwi...@cloudera.com wrote:
  
   You want to generate them using mvn instead.  See the
  mapreduce/yarn/README
   file for how to do it.
  
   On Fri, Jul 8, 2011 at 7:00 AM, Devaraj K devara...@huawei.com
 wrote:
  
Hi,
   
   
   
  I am getting this below errors when I try to generate eclipse files
   using
eclipse-files target. Can anybody help me?
   
   
   
   
   
Buildfile: D:\svn\nextgenmapreduce\mapreduce\build.xml
   
ivy-download:
 [get] Getting:

 http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
 [get] To: D:\svn\nextgenmapreduce\mapreduce\ivy\ivy-2.2.0.jar
 [get] Not modified - so not downloaded
   
ivy-init-dirs:
   
ivy-probe-antlib:
   
ivy-init-antlib:
   
ivy-init:
[ivy:configure] :: Ivy non official version -  ::
http://ant.apache.org/ivy/ http://ant.apache.org/ivy/ ::
[ivy:configure] :: loading settings :: file =
D:\svn\nextgenmapreduce\mapreduce\ivy\ivysettings.xml
   
ivy-resolve-common:
[ivy:resolve]
[ivy:resolve] :: problems summary ::
[ivy:resolve]  WARNINGS
[ivy:resolve] module not found:
org.apache.hadoop#yarn-server-common;1.0-SNAPSHOT
[ivy:resolve]  apache-snapshot: tried
[ivy:resolve]

   
  
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/had
   
  oop/yarn-server-common/1.0-SNAPSHOT/yarn-server-common-1.0-SNAPSHOT.pom
   
   
  
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hado
   
 op/yarn-server-common/1.0-SNAPSHOT/yarn-server-common-1.0-SNAPSHOT.pom
[ivy:resolve]   -- artifact
   
  org.apache.hadoop#yarn-server-common;1.0-SNAPSHOT!yarn-server-common.jar:
[ivy:resolve]

   
  
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/had
   
  oop/yarn-server-common/1.0-SNAPSHOT/yarn-server-common-1.0-SNAPSHOT.jar
   
   
  
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hado
   
 op/yarn-server-common/1.0-SNAPSHOT/yarn-server-common-1.0-SNAPSHOT.jar
[ivy:resolve]  maven2: tried
[ivy:resolve]

   
  
 
 http://repo1.maven.org/maven2/org/apache/hadoop/yarn-server-common/1.0-SNAP
SHOT/yarn-server-common-1.0-SNAPSHOT.pom
   
   
  
 
 http://repo1.maven.org/maven2/org/apache/hadoop/yarn-server-common/1.0-SNAPS
HOT/yarn-server-common-1.0-SNAPSHOT.pom
[ivy:resolve]   -- artifact
   
  org.apache.hadoop#yarn-server-common;1.0-SNAPSHOT!yarn-server-common.jar:
[ivy:resolve]

   
  
 
 http://repo1.maven.org/maven2/org/apache/hadoop/yarn-server-common/1.0-SNAP
SHOT/yarn-server-common-1.0-SNAPSHOT.jar
   
   
  
 
 http://repo1.maven.org/maven2/org/apache/hadoop/yarn-server-common/1.0-SNAPS
HOT/yarn-server-common-1.0-SNAPSHOT.jar
[ivy:resolve] module not found:
org.apache.hadoop#hadoop-mapreduce-client-core;1.0-SNAPSHOT
[ivy:resolve]  apache-snapshot: tried
[ivy:resolve]

   
  
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/had
   
   
  
 
 oop/hadoop-mapreduce-client-core/1.0-SNAPSHOT/hadoop-mapreduce-client-core-1
.0-SNAPSHOT.pom
   
   
  
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/hado
   
   
  
 
 op/hadoop-mapreduce-client-core/1.0-SNAPSHOT

[jira] [Created] (MAPREDUCE-2641) Fix the ExponentiallySmoothedTaskRuntimeEstimator and its unit test

2011-07-05 Thread Josh Wills (JIRA)
Fix the ExponentiallySmoothedTaskRuntimeEstimator and its unit test
---

 Key: MAPREDUCE-2641
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2641
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Reporter: Josh Wills
Assignee: Josh Wills
Priority: Minor
 Attachments: MAPREDUCE-2641.patch

Fixed the ExponentiallySmoothedTaskRuntimeEstimator so that it can run and pass 
the test defined for it in TestRuntimeEstimators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2642) Fix two bugs in v2.app.speculate.DataStatistics

2011-07-05 Thread Josh Wills (JIRA)
Fix two bugs in v2.app.speculate.DataStatistics
---

 Key: MAPREDUCE-2642
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2642
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Reporter: Josh Wills
Assignee: Josh Wills
Priority: Minor


Fixes two bugs in DataStatistics: a divide by zero in the variance calculation 
when count == 0, and a synchronization issue in how the updateStatistics method 
was implemented.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2643) Fixing typos/formatting/null checking in v2.app.speculate package

2011-07-05 Thread Josh Wills (JIRA)
Fixing typos/formatting/null checking in v2.app.speculate package
-

 Key: MAPREDUCE-2643
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2643
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Reporter: Josh Wills
Assignee: Josh Wills
Priority: Minor


No functional changes in this patch: just fixing some typos in the comments, 
fixing some formatting issues, renaming classes for consistency, and making the 
null checking around Jobs/Tasks more consistent.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Introduction

2011-07-05 Thread Josh Wills
Hello MR Developers,

My name is Josh, I started working at Cloudera last week as a data
scientist. I spent my last three years at Google, where I worked on the ads
system for awhile and then built alot of the data analysis and experiment
infrastructure behind Google+. I am really excited about the NextGen MR
project, especially because of the potential to create new classes of
data-intensive applications to run on Hadoop. I'm looking forward to helping
out wherever I can, and while I apologize for all of the JIRA spam you've
been receiving from me lately, I have no intentions of letting up. :)

Best,
Josh


Re: Introduction

2011-07-05 Thread Josh Wills
Thanks Arun!

On Tue, Jul 5, 2011 at 9:51 AM, Arun C Murthy a...@hortonworks.com wrote:

 Josh,

  Welcome on board. Apache Hadoop is a volunteer driven community project
 and we are glad to have you help us. I, personally, look forward to working
 with you on NG MR and am excited to see you report issues and fix them!

 Arun

 On Jul 5, 2011, at 8:34 AM, Josh Wills wrote:

  Hello MR Developers,
 
  My name is Josh, I started working at Cloudera last week as a data
  scientist. I spent my last three years at Google, where I worked on the
 ads
  system for awhile and then built alot of the data analysis and experiment
  infrastructure behind Google+. I am really excited about the NextGen MR
  project, especially because of the potential to create new classes of
  data-intensive applications to run on Hadoop. I'm looking forward to
 helping
  out wherever I can, and while I apologize for all of the JIRA spam you've
  been receiving from me lately, I have no intentions of letting up. :)
 
  Best,
  Josh




[jira] [Created] (MAPREDUCE-2639) MR-279: Fixup the exponentially smoothed runtime estimator, fix a couple of bugs in DataStatistics, and do a little bit of cleanup.

2011-07-03 Thread Josh Wills (JIRA)
MR-279: Fixup the exponentially smoothed runtime estimator, fix a couple of 
bugs in DataStatistics, and do a little bit of cleanup.
---

 Key: MAPREDUCE-2639
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2639
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
 Environment: All
Reporter: Josh Wills
Assignee: Josh Wills
Priority: Minor
 Attachments: MAPREDUCE-2639.patch

A catch-all JIRA for a pass I took through the v2.app.speculate package.

1) Fixed the ExponentiallySmoothedTaskRuntimeEstimator so that it can run and 
pass the test defined in TestRuntimeEstimators.
2) Fixed two bugs in DataStatistics: 1) a divide by zero in the variance 
calculation in the case that count == 0 and 2) a synchronization issue in how 
the updateStatistics method was implemented,
3) A bunch of typo corrections, formatting fixes, and adding some consistency 
around the null value checking.

I probably need to do a couple more passes through this code to get it into 
better shape, but this seemed like a good start. Will attach my patch 
momentarily.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2633) MR-279: Add a getCounter(Enum) method to the Counters interface

2011-06-30 Thread Josh Wills (JIRA)
MR-279: Add a getCounter(Enum) method to the Counters interface
---

 Key: MAPREDUCE-2633
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2633
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
 Environment: All
Reporter: Josh Wills
Priority: Minor


I'm fixing a few TODOs I came across in TaskAttemptImpl.java related to the 
fact that the MRv2 Counters interface don't expose a getCounter(Enum) method 
for accessing a Counter using the enum's class as the group name and the enum's 
value as the name of the counter.

Will add the patch momentarily.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2630) MR-279: refreshQueues leads to NPEs when used w/FifoScheduler

2011-06-29 Thread Josh Wills (JIRA)
MR-279: refreshQueues leads to NPEs when used w/FifoScheduler
-

 Key: MAPREDUCE-2630
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2630
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
 Environment: All
Reporter: Josh Wills
Priority: Minor


The RM's admin service exposes a method refreshQueues that is used to update 
the queue configuration when used with the CapacityScheduler, but if it is used 
with the FifoScheduler, it will set the 
containerTokenSecretManager/clusterTracker fields on the FifoScheduler to null, 
which eventually leads to NPE. Since the FifoScheduler only has one queue that 
cannot be refreshed, the correct behavior is for the refreshQueues call to be a 
no-op.

I will attach a patch that fixes this by splitting the ResourceScheduler's 
reinitialize method into separate initialize/updateQueues methods.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: mappers

2011-06-26 Thread Josh Wills
(redirected to mapreduce-user@, mapreduce-dev bcc'd)

The param you're referring to controls the maximum number of
simultaneously active mappers on a given task tracker, i.e., how many
map slots are available on that node. But a single task tracker can be
used for multiple MR jobs, so you can't look at the metrics for the
task tracker to see how many mappers ran on a job. For a single job,
the total number of mappers that are run == the number of input
splits.

Hoping that anyone who knows this stuff better than I do will reply to
correct any mistakes in my answer,
Josh

On Sun, Jun 26, 2011 at 5:16 AM, Keren Ouaknine ker...@gmail.com wrote:
 Hello,

 I am looking for the actual number of mappers on each machine for the job. I
 know how to configure the max number (mapred.tasktracker.map.tasks.maximum
 in mapred-site.xml file), but not the actual number of mappers that were
 running for a completed job.

 Any idea where can I find this data?
 Thanks,
 Keren

 --
 Keren Ouaknine
 Cell: +972 54 2565404
 Web: www.kereno.com



Re: Queries on MRv2

2011-06-15 Thread Josh Wills
Hey Praveen,

I'm in the same boat as you re: getting started with the MR2 code. I
have a couple of answers and a couple of followup questions for Arun
et al. to keep in mind as they're writing a design doc.

On Wed, Jun 15, 2011 at 5:27 AM, Praveen Sripati
praveensrip...@gmail.com wrote:
 Hi,

 - How to specify that an ApplicationMaster use a particular version of the
 MapReduce library dynamically?

I don't totally grok the question-- doesn't the client-side code that
configures the ApplicationMaster decide this?


 - How does the ApplicationManager pick a node to run the ApplicationMaster?
 What resource considerations are taken if any while picking a particular
 node to run the ApplicationMaster?

There is an ApplicationsManager (note the extra 's') that is part of
the functionality of the RM. See reference:
http://developer.yahoo.com/blogs/hadoop/posts/2011/03/mapreduce-nextgen-scheduler/


 - Who observes the ResourceManager/ApplicationMaster/NodeManager for
 failures to be restarted later? From the blog entry it seems that the state
 of the ResourceManager is stored in the ZooKeeper and the state of the
 ApplicationManager is stored in the HDFS.

So this is the classic problem of any such system-- who watches the
watchmen?  It seems like the client would be notified when an
ApplicationManager failed by the ApplicationSManager (see above blog
post again, it's actually a good blog post, it would be great to have
a few more of them), the ResourceManager would know when a NodeManager
failed, and it falls to an admin and/or an external monitoring system
to detect ResourceManager failure and handle the restart.


 - Looks like the containers are based on Linux cgroups. So, is the MRv2
 limited only to the Linux boxes?

Yeah, I bumped into this when I was doing a naive build + install on
my Mac. Not that I see folks running alot of hadoop clusters on Macs,
but it would be cool if the basic build/install just worked on every
platform, even if it's just as simple as detecting the platform and
skipping the build of the native container-executor stuff. (Note: I
actually got the container-executor stuff to build by using the
standard Mac tricks, but I'm not sure if it's worth checking in.)


 Hope the design document from Arun will make me ask less queries in this
 forum :)

 Thanks,
 Praveen

 On Wed, Jun 15, 2011 at 9:17 AM, Mahadev Konar maha...@apache.org wrote:

 Praveen,
  In that case, if a just launched container is released, the NM will
 be notified via the RM that the container is not longer valid and the
 NM will go ahead and kill the container.


 On Tue, Jun 14, 2011 at 8:38 PM, Praveen Sripati
 praveensrip...@gmail.com wrote:
  Mahadev,
 
  MapReduce ApplicationMaster might behave well, but what about custom
  ApplicationMasters for other models.
 
  Q) What happens if an ApplicationMaster asks a NM to launch a container
  and
  then releases the container in the allocate call later?
 
  A) The Application Master only releases the container once the container
  is done.
 
  Thanks,
  Praveen
 
  On Wed, Jun 15, 2011 at 8:59 AM, Mahadev Konar maha...@apache.org
 wrote:
 
  Praveen,
   Answers in line:
 
  
   Q) What happens if an ApplicationMaster asks a NM to launch a
 container
  and
   then releases the container in the allocate call later?
 
  The Application Master only releases the container once the container is
  done.
 
  
   Q) So, the NM watches the UNIX Process/Containers and sends the status
 to
   the ApplicationManager. Later the ApplicationManager sends the status
 of
  the
   containers in response to the allocate call to the ApplicationMaster.
 Why
   should the ApplicationMaster be aware of the container status, since
 it's
   already tracking the map/reduce tasks in the containers?
 
  Its just a way to notify the application master as soon as possible
  when the containers fail.
  This helps in speeding up the notification of failed containers else
  AM has to wait for discovering
  failures via timeouts.
 
  
   Q) Does the ApplicationMaster notify the NodeManager to exit the UNIX
   Process when the map/reduce tasks in that particular container are
   completed? Are the containers re-used?
 
  Yes it notifes the NM.
 
  Containers are not re used as of now. In future we do see the
  containers being re used but we'll need leases to do that.
 
  
   Q) The ApplicationManager asks the NodeManager to create a container
 and
   also launch the map/reduce task in it. From then on the
  ApplicationManager
   and Map/Reduce tasks interact directly without the NodeManager. Am I
   correct?
  
  I think you mean ApplicationMaster. Yes, the applicationmaster and
  map/reduce tasks talk directly
  without NM being involved.
 
   Praveen
  
   On Wed, Jun 15, 2011 at 12:59 AM, Arun C Murthy a...@yahoo-inc.com
  wrote:
  
  
   On Jun 14, 2011, at 6:31 PM, Praveen Sripati wrote:
  
    Hi,
  
   I have gone through MapReduce NextGen Blog entries and JIRA and have
  the
   following