Re: Heads up - 2.0.5-beta

2013-04-26 Thread Arun C Murthy

On Apr 25, 2013, at 6:36 PM, Suresh Srinivas wrote:

 On Thu, Apr 25, 2013 at 6:34 PM, Arun C Murthy a...@hortonworks.com wrote:
 
 Similarly on HDFS side, can someone please help out by tagging features,
 bug-fixes, protocol/API changes etc.? This way we can ensure HDFS APIs 
 protocols are locked down too - I'd really appreciate it!
 
 To ensure a timely release of 2.0.5-beta, we should not hold back for
 individual features. However, I would like to make necessary API and/or
 protocol changes right-away. This will allow us to adding  features in
 subsequent releases e.g. hadoop-2.2 or hadoop-2.3 etc without breaking
 compatibility. 

+1, sounds like a good plan. Thanks!

Arun

Cannot find JobTracker and TaskTracker classes in Hadoop 2.0.2-alpha

2013-04-26 Thread Thoihen Maibam
Hi,

Can anyone help me out where can I find JobTracker and TaskTracker classes
for the above releases, it's not present in hadoop-mapreduce-project. I was
tracing through the source code from JobSubmission but lost the flow as I
could not find the JobTracker and TaskTracker.

Are these classes replaced with some other classes ?

Regards
thoihen


Re: Heads up - 2.0.5-beta

2013-04-26 Thread Arun C Murthy

On Apr 25, 2013, at 7:31 PM, Roman Shaposhnik wrote:

 On Thu, Apr 25, 2013 at 6:34 PM, Arun C Murthy a...@hortonworks.com wrote:
 
 With that in mind, I really want to make a serious push to lock down APIs 
 and wire-protocols for hadoop-2.0.5-beta.
 Thus, we can confidently support hadoop-2.x in a compatible manner in the 
 future. So, it's fine to add new features,
 but please ensure that all APIs are frozen for hadoop-2.0.5-beta
 
 Arun, since it sounds like you have a pretty definite idea
 in mind for what you want 'beta' label to actually mean,
 could you, please, share the exact criteria? 

Sorry, I'm not sure if this is exactly what you are looking for but, as I 
mentioned above, the primary aim would be make the final set of required 
API/write-protocol changes so that we can call it a 'beta' i.e. once 2.0.5-beta 
ships users  downstream projects can be confident about forward compatibility 
in hadoop-2.x line. Obviously, we might discover a blocker bug post 2.0.5 which 
*might* necessitate an unfortunate change - but that should be an outstanding 
exception.

Hope that helps.

thanks,
Arun



[jira] [Resolved] (MAPREDUCE-5167) Update MR App after YARN-562

2013-04-26 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-5167.


   Resolution: Fixed
Fix Version/s: 2.0.5-beta
 Hadoop Flags: Reviewed

I committed this to trunk and branch-2 together with YARN-562. Thanks Jian!

 Update MR App after YARN-562
 

 Key: MAPREDUCE-5167
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5167
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Fix For: 2.0.5-beta

 Attachments: MAPREDUCE-5167.txt


 Tracking JIRA for MR changes at YARN-562.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Heads up - 2.0.5-beta

2013-04-26 Thread Eli Collins
On Fri, Apr 26, 2013 at 11:15 AM, Arun C Murthy a...@hortonworks.com wrote:

 On Apr 25, 2013, at 7:31 PM, Roman Shaposhnik wrote:

 On Thu, Apr 25, 2013 at 6:34 PM, Arun C Murthy a...@hortonworks.com wrote:

 With that in mind, I really want to make a serious push to lock down APIs 
 and wire-protocols for hadoop-2.0.5-beta.
 Thus, we can confidently support hadoop-2.x in a compatible manner in the 
 future. So, it's fine to add new features,
 but please ensure that all APIs are frozen for hadoop-2.0.5-beta

 Arun, since it sounds like you have a pretty definite idea
 in mind for what you want 'beta' label to actually mean,
 could you, please, share the exact criteria?

 Sorry, I'm not sure if this is exactly what you are looking for but, as I 
 mentioned above, the primary aim would be make the final set of required 
 API/write-protocol changes so that we can call it a 'beta' i.e. once 
 2.0.5-beta ships users  downstream projects can be confident about forward 
 compatibility in hadoop-2.x line. Obviously, we might discover a blocker bug 
 post 2.0.5 which *might* necessitate an unfortunate change - but that should 
 be an outstanding exception.

Arun, Suresh,

Mind reviewing the following page Karthik put together on
compatibility?   http://wiki.apache.org/hadoop/Compatibility

I think we should do something similar to what Sanjay proposed in
HADOOP-5071 for Hadoop v2.   If we get on the same page on
compatibility terms/APIs then we can quickly draft the policy, at
least for the things we've already got consensus on.  I think our new
developers, users, downstream projects, and partners would really
appreciate us making this clear.  If people like the content we can
move it to the Hadoop website and maintain it in svn like the bylaws.

The reason I think we need to do so is because there's been confusion
about what types of compatibility we promise and some open questions
which I'm not sure everyone is clear on. Examples:
- Are we going to preserve Hadoop v3 clients against v2 servers now
that we have protobuf support?  (I think so..)
- Can we break rolling upgrade of daemons in updates post GA? (I don't
think so..)
- Do we disallow HDFS metadata changes that require an HDFS upgrade in
an update? (I think so..)
- Can we remove methods from v2 and v2 updates that were deprecated in
v0.20-22?  (Unclear)
- Will we preserve binary compatibility for MR2 going forward? (I think so..)
- Does the ability to support multiple versions of MR simultaneously
via MR2 change the MR API compatibility story? (I don't think so..)
- Are the RM protocols sufficiently stable to disallow incompatible
changes potentially required by non-MR projects? (Unclear, most large
Yarn deployments I'm aware of are running 0.23, not v2 alphas)

I'm also not sure there's currently consensus on what an incompatible
change is. For example, I think HADOOP-9151 is incompatible because it
broke client/server wire compatibility with previous releases and any
change that breaks wire compatibility is incompatible.  Suresh felt it
was not an incompatible change because it did not affect API
compatibility (ie PB is not considered part of the API) and the change
occurred while v2 is in alpha.  Not sure we need to go through the
whole exercise of what's allowed in an alpha and beta (water under the
bridge, hopefully), but I do think we should clearly define an
incompatible change.  It's fine that v2 has been a bit wild wild west
in the alpha development stage but I think we need to get a little
more rigorous.

Thanks,
Eli


[jira] [Resolved] (MAPREDUCE-5180) Running wordcount with -Ddfs.client.read.shortcircuit=true/false fails to get proper message on syslogs

2013-04-26 Thread yeshavora (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yeshavora resolved MAPREDUCE-5180.
--

Resolution: Invalid

 Running wordcount with -Ddfs.client.read.shortcircuit=true/false fails to 
 get proper message on syslogs
 -

 Key: MAPREDUCE-5180
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5180
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: yeshavora
 Fix For: 1.3.0

 Attachments: Screen Shot 2013-04-24 at 1.12.31 PM.png


 Running wordcount job with -Ddfs.client.read.shortcircuit=true/false fails to 
 mention hdfs.DFSClient: Short circuit read is true or hdfs.DFSClient: 
 Short circuit read is false messages in syslogs. 
 Attaching screen shot of syslog output for Hadoop 1.1.2. The above message 
 was present in the logs earlier.
 Syslog Output of Hadoop 1.3
 2013-04-18 13:07:08,265 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded 
 the native-hadoop library
 2013-04-18 13:07:10,002 INFO org.apache.hadoop.util.ProcessTree: setsid 
 exited with exit code 0
 2013-04-18 13:07:10,577 INFO org.apache.hadoop.mapred.Task:  Using 
 ResourceCalculatorPlugin : 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4edc41c5
 2013-04-18 13:07:10,682 INFO org.apache.hadoop.mapred.MapTask: Processing 
 split: hdfs://node1:port1/input1.txt:0+215754
 2013-04-18 13:07:10,706 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 
 200
 2013-04-18 13:07:10,910 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
 150994944/167772160
 2013-04-18 13:07:10,910 INFO org.apache.hadoop.mapred.MapTask: record buffer 
 = 2359296/2621440
 2013-04-18 13:07:10,920 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: 
 Snappy native library is available
 2013-04-18 13:07:10,920 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy: 
 Snappy native library loaded
 2013-04-18 13:07:10,934 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: 
 Loaded native gpl library
 2013-04-18 13:07:10,947 INFO com.hadoop.compression.lzo.LzoCodec: 
 Successfully loaded amp; initialized native-lzo library [hadoop-lzo rev 
 cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3]
 2013-04-18 13:07:11,414 INFO org.apache.hadoop.mapred.MapTask: Starting flush 
 of map output
 2013-04-18 13:07:11,586 INFO org.apache.hadoop.io.compress.CodecPool: Got 
 brand-new compressor
 2013-04-18 13:07:11,962 INFO org.apache.hadoop.mapred.MapTask: Finished spill  0
 2013-04-18 13:07:12,034 INFO org.apache.hadoop.mapred.Task: 
 Task:attempt_201304181305_0001_m_00_0 is done. And is in the process of 
 commiting
 2013-04-18 13:07:12,106 INFO org.apache.hadoop.mapred.Task: Task 
 apos;attempt_201304181305_0001_m_00_0apos; done.
 2013-04-18 13:07:12,152 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logsapos; truncater with mapRetainSize=-1 and 
 reduceRetainSize=-1
 2013-04-18 13:07:12,637 INFO org.apache.hadoop.io.nativeio.NativeIO: 
 Initialized cache for UID to User mapping with a cache timeout of 14400 
 seconds.
 2013-04-18 13:07:12,637 INFO org.apache.hadoop.io.nativeio.NativeIO: Got 
 UserName mapred for UID 2002 from the native implementation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Environment setup for testing a patch submitting jobs from eclipse.

2013-04-26 Thread Thoihen Maibam
Hi All,

Apology to everyone in case my question has been dealt before.If the
question has been answered before please do provide me the link .

Basically, I want to start contributing to Hadoop by submitting patches
mostly from Map Reduce issues.

1. Assuming I create a patch, now I want to  test the patch.
2. Basically, I want to work from eclipse and make use of the breakpoints
supported by eclipse.
3. Assuming I ran all the hadoop daemons in single node.
4. Will the eclipse plugin included in hadoop work for submiting the jobs
and hit the breakpoints , this is because I want to trace the code.
5. Normally, how do the Hadoop committers setup their Hadoop development
setup look like. Do they use eclipse to set breakpoints in the eclipse ide.
6. If I use eclipse , learning curve of Hadoop code base would be very easy.

Can somebody guide me how do I submit my job from eclipse and set
breakpoints in Hadoop core code in JobTracker, TaskTracker etc.

Regards
thoihen


Re: Heads up - 2.0.5-beta

2013-04-26 Thread Luke Lu
If protocol compatibility of v2 and v3 is a goal, HADOOP-8990 should be a
blocker for v2.

__Luke

On Fri, Apr 26, 2013 at 12:07 PM, Eli Collins e...@cloudera.com wrote:

 On Fri, Apr 26, 2013 at 11:15 AM, Arun C Murthy a...@hortonworks.com
 wrote:
 
  On Apr 25, 2013, at 7:31 PM, Roman Shaposhnik wrote:
 
  On Thu, Apr 25, 2013 at 6:34 PM, Arun C Murthy a...@hortonworks.com
 wrote:
 
  With that in mind, I really want to make a serious push to lock down
 APIs and wire-protocols for hadoop-2.0.5-beta.
  Thus, we can confidently support hadoop-2.x in a compatible manner in
 the future. So, it's fine to add new features,
  but please ensure that all APIs are frozen for hadoop-2.0.5-beta
 
  Arun, since it sounds like you have a pretty definite idea
  in mind for what you want 'beta' label to actually mean,
  could you, please, share the exact criteria?
 
  Sorry, I'm not sure if this is exactly what you are looking for but, as
 I mentioned above, the primary aim would be make the final set of required
 API/write-protocol changes so that we can call it a 'beta' i.e. once
 2.0.5-beta ships users  downstream projects can be confident about forward
 compatibility in hadoop-2.x line. Obviously, we might discover a blocker
 bug post 2.0.5 which *might* necessitate an unfortunate change - but that
 should be an outstanding exception.

 Arun, Suresh,

 Mind reviewing the following page Karthik put together on
 compatibility?   http://wiki.apache.org/hadoop/Compatibility

 I think we should do something similar to what Sanjay proposed in
 HADOOP-5071 for Hadoop v2.   If we get on the same page on
 compatibility terms/APIs then we can quickly draft the policy, at
 least for the things we've already got consensus on.  I think our new
 developers, users, downstream projects, and partners would really
 appreciate us making this clear.  If people like the content we can
 move it to the Hadoop website and maintain it in svn like the bylaws.

 The reason I think we need to do so is because there's been confusion
 about what types of compatibility we promise and some open questions
 which I'm not sure everyone is clear on. Examples:
 - Are we going to preserve Hadoop v3 clients against v2 servers now
 that we have protobuf support?  (I think so..)
 - Can we break rolling upgrade of daemons in updates post GA? (I don't
 think so..)
 - Do we disallow HDFS metadata changes that require an HDFS upgrade in
 an update? (I think so..)
 - Can we remove methods from v2 and v2 updates that were deprecated in
 v0.20-22?  (Unclear)
 - Will we preserve binary compatibility for MR2 going forward? (I think
 so..)
 - Does the ability to support multiple versions of MR simultaneously
 via MR2 change the MR API compatibility story? (I don't think so..)
 - Are the RM protocols sufficiently stable to disallow incompatible
 changes potentially required by non-MR projects? (Unclear, most large
 Yarn deployments I'm aware of are running 0.23, not v2 alphas)

 I'm also not sure there's currently consensus on what an incompatible
 change is. For example, I think HADOOP-9151 is incompatible because it
 broke client/server wire compatibility with previous releases and any
 change that breaks wire compatibility is incompatible.  Suresh felt it
 was not an incompatible change because it did not affect API
 compatibility (ie PB is not considered part of the API) and the change
 occurred while v2 is in alpha.  Not sure we need to go through the
 whole exercise of what's allowed in an alpha and beta (water under the
 bridge, hopefully), but I do think we should clearly define an
 incompatible change.  It's fine that v2 has been a bit wild wild west
 in the alpha development stage but I think we need to get a little
 more rigorous.

 Thanks,
 Eli



Re: Versions - Confusion

2013-04-26 Thread Robert Evans
It is kind of complex.

Up until 0.20 everything was fairly regular like you would expect.  In
0.20 there was a split where security was added in to a branch and started
to be numbered as 0.20.20X.  But the other releases went on without
security and became 0.21 and 0.22.  0.23 was created when YARN was
introduced and it also had security merged in.  To be fair 0.22 had
security in it, but was never officially supported in a release.  At about
this same time the community decided that we needed to do something better
with number and renamed 0.20.20X to be 1.0 and started releasing more
versions from this line.  This is the current stable line. 0.23 was
renamed 2.0 and there have been a few releases but the code is still being
stabilized.  To make things even more confusing some people kept 0.23
alive and stabilized it, so there have been some releases of 0.23 in
parallel with 2.0.  The difference between the two is that 2.0 had HDFS HA
in it where as 0.23 does not.

--Bobby Evans

On 4/26/13 12:39 AM, Suresh S suresh...@gmail.com wrote:

Hello,

I was confused with Hadoop versioning.
I found that,  some people working on version starting with 0.
Some others, working on version starting with 2.
Also, i was confused with branch.

Which version is really current version.
*Regards*
*S.Suresh,*
*Research Scholar,*
*Department of Computer Applications,*
*National Institute of Technology,*
*Tiruchirappalli - 620015.*
*+91-9941506562*



Re: Heads up - 2.0.5-beta

2013-04-26 Thread Suresh Srinivas
Eli, I will post a more detailed reply soon. But one small correction:


I'm also not sure there's currently consensus on what an incompatible
 change is. For example, I think HADOOP-9151 is incompatible because it
 broke client/server wire compatibility with previous releases and any
 change that breaks wire compatibility is incompatible.  Suresh felt it
 was not an incompatible change because it did not affect API
 compatibility (ie PB is not considered part of the API) and the change
 occurred while v2 is in alpha.


This is not correct. I did not say it was not an incompatible change.
It was indeed an incompatible wire protocol change. My argument was,
the phase of development we were in, we could not mark wire protocol
as stable and not make any incompatible change. But once 2.0.5-beta
is out, as had discussed earlier, we should not make further incompatible
changes to wire protocol.

-- 
http://hortonworks.com/download/


Re: Heads up - 2.0.5-beta

2013-04-26 Thread Eli Collins
On Fri, Apr 26, 2013 at 2:42 PM, Suresh Srinivas sur...@hortonworks.com wrote:
 Eli, I will post a more detailed reply soon. But one small correction:


 I'm also not sure there's currently consensus on what an incompatible
 change is. For example, I think HADOOP-9151 is incompatible because it
 broke client/server wire compatibility with previous releases and any
 change that breaks wire compatibility is incompatible.  Suresh felt it
 was not an incompatible change because it did not affect API
 compatibility (ie PB is not considered part of the API) and the change
 occurred while v2 is in alpha.


 This is not correct. I did not say it was not an incompatible change.
 It was indeed an incompatible wire protocol change. My argument was,
 the phase of development we were in, we could not mark wire protocol
 as stable and not make any incompatible change. But once 2.0.5-beta
 is out, as had discussed earlier, we should not make further incompatible
 changes to wire protocol.

Sorry for the confusion, I misinterpreted your comments on the jira
(specifically, This is an incompatible change: I disagree. and see
my argument that about why this is not incompatible.)  to indicate
that you thought it was not incompatible.




 --
 http://hortonworks.com/download/


Re: Environment setup for testing a patch submitting jobs from eclipse.

2013-04-26 Thread maisnam ns
Hi thoihen,

It is extremely tough to debug the Hadoop core code, but it is not
impossible.In fact debugging through eclipse and setting breakpoints may
help considerably in understanding the flow of hadoop core logic .

Steps are given below:
1. You may have to run all you hadoop daemons in your single machine/laptop
.
2. Lets say you want to debug bin/hadoop namenode -format logic flow.
3. First in your hadoop conf directory find hadoop-env.sh add this line 
export
HADOOP_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5002
4. Set your debug configurations and for port add 5002 and localhost as
server
5.Copy the required hadoop source files core/mapred/hdfs etc (assuming
1.0.4 version) in your eclipse project .
6.Search for NameNode.java with ctrl+shift+R and type NameNode.java
7. Open NameNode file , and set a breakpoint in inside the main method ,
say at createNameNode method
8. Now open the terminal cd to hadoop folder type bin/hadoop namenode
-format and run this command when it start as breakpoint is set it will stop
9. Now right click on NameNode.java and Debug As java application and then
click
10. Your breakpoint will be hit at the location where you put your
breakpoint.
11. Press F6 to go line by line or F5 to enter inside the method in this
case createNameNode
12. If you encounter errors  you may have to resolve those issues first.
13. Your breakpoint will be hit and keep on following the logic depending
on your interest to go line by line or inside the method
**


On Sat, Apr 27, 2013 at 1:18 AM, Thoihen Maibam thoihen...@gmail.comwrote:

 Hi All,

 Apology to everyone in case my question has been dealt before.If the
 question has been answered before please do provide me the link .

 Basically, I want to start contributing to Hadoop by submitting patches
 mostly from Map Reduce issues.

 1. Assuming I create a patch, now I want to  test the patch.
 2. Basically, I want to work from eclipse and make use of the breakpoints
 supported by eclipse.
 3. Assuming I ran all the hadoop daemons in single node.
 4. Will the eclipse plugin included in hadoop work for submiting the jobs
 and hit the breakpoints , this is because I want to trace the code.
 5. Normally, how do the Hadoop committers setup their Hadoop development
 setup look like. Do they use eclipse to set breakpoints in the eclipse ide.
 6. If I use eclipse , learning curve of Hadoop code base would be very
 easy.

 Can somebody guide me how do I submit my job from eclipse and set
 breakpoints in Hadoop core code in JobTracker, TaskTracker etc.

 Regards
 thoihen



Re: Heads up - 2.0.5-beta

2013-04-26 Thread Konstantin Shvachko
Arun,

Could you please define the release plan and put it into vote.
In accordance with the ByLaws. After this discussion of course.

http://hadoop.apache.org/bylaws.html
Release Plan
Defines the timetable and actions for a release. The plan also nominates a
Release Manager.
Lazy majority of active committers

Do I understand correctly you volunteering for RM? Just to clarify.
Suresh had already put a list of features for HDFS and common.
So you probably need to indicate features for MapReduce and Yarn.

Thanks,
--Konstantin



On Thu, Apr 25, 2013 at 6:34 PM, Arun C Murthy a...@hortonworks.com wrote:

 Gang,

  With hadoop-2.0.4-alpha released, I'd like 2.0.4 to be the final of our
 hadoop-2.x alphas. We have made lots of progress on hadoop-2.x and I
 believe we are nearly there, exciting times!

  As we have discussed previously, I hope to do a final push to stabilize
 hadoop-2.x, release a hadoop-2.0.5-beta in the next month or so; and then
 declare hadoop-2.1 as stable this summer after a short period of intensive
 testing.

  With that in mind, I really want to make a serious push to lock down APIs
 and wire-protocols for hadoop-2.0.5-beta. Thus, we can confidently support
 hadoop-2.x in a compatible manner in the future. So, it's fine to add new
 features, but please ensure that all APIs are frozen for hadoop-2.0.5-beta

  Vinod is helping out on the YARN/MR side and has tagged a number of final
 changes (including some the final API incompatibilities) we'd like to push
 in before we call hadoop-2.x as ready to be supported (Target Version set
 to 2.0.5-beta):
  http://s.apache.org/target-hadoop-2.0.5-beta
  Thanks Vinod! (Note some of the sub-tasks of umbrella jiras may not be
 tagged, but their necessity is implied).

  Similarly on HDFS side, can someone please help out by tagging features,
 bug-fixes, protocol/API changes etc.? This way we can ensure HDFS APIs 
 protocols are locked down too - I'd really appreciate it!

 thanks,
 Arun


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





Re: Heads up - 2.0.5-beta

2013-04-26 Thread Konstantin Shvachko
Arun, Suresh,

Very exciting to hear about this final push to stable Hadoop 2.
But I have a problem. Either with the plan or with the version number.
I'll be arguing for the number change below rather than the plan.

1. Based on features listed by Suresh it looks that you plan a heavy
feature-full release.
2. You are saying you want to complete this within a month (or so).
3. You would like to give it beta quality mark.

Not saying it is impossible. But in line with the common saying
You can have fast, good or big: pick two
(a little rephrasing here)
I would like to propose to leave some gap between 2.0.4 and the next
version so that just in case there was a version to put bug fixes on top
of  the last release.
Do you think we can call the version you proposed to release
2.1.0 or 2.1.0-beta?

The proposed new features imho do not exactly conform with the idea
of dot-dot release, but definitely qualify for a major number change.
I am just trying to avoid rather ugly 2.0.4.1 versions, which of course
also possible.

Thanks,
--Konstantin


On Thu, Apr 25, 2013 at 6:36 PM, Suresh Srinivas sur...@hortonworks.comwrote:

 Thanks for starting this discussion. I volunteer to do a final review of
 protocol changes, so we can avoid incompatible changes to API and wire
 protocol post 2.0.5 in Common and HDFS.

 We have been working really hard on the following features. I would like to
 get into 2.x and see it reach HDFS users:
 # Snapshots
 # NFS gateway for HDFS
 # HDFS-347 unix domain socket based short circuits
 # Windows support

 Other HDFS folks please let me know if I missed anything.

 To ensure a timely release of 2.0.5-beta, we should not hold back for
 individual features. However, I would like to make necessary API and/or
 protocol changes right-away. This will allow us to adding  features in
 subsequent releases e.g. hadoop-2.2 or hadoop-2.3 etc without breaking
 compatibility. For e.g. even if we don't complete NFS support, making
 FileID related changes in 2.0.5-beta will ensure future compatbility.

 Regards,
 Suresh



 On Thu, Apr 25, 2013 at 6:34 PM, Arun C Murthy a...@hortonworks.com
 wrote:

  Gang,
 
   With hadoop-2.0.4-alpha released, I'd like 2.0.4 to be the final of our
  hadoop-2.x alphas. We have made lots of progress on hadoop-2.x and I
  believe we are nearly there, exciting times!
 
   As we have discussed previously, I hope to do a final push to stabilize
  hadoop-2.x, release a hadoop-2.0.5-beta in the next month or so; and then
  declare hadoop-2.1 as stable this summer after a short period of
 intensive
  testing.
 
   With that in mind, I really want to make a serious push to lock down
 APIs
  and wire-protocols for hadoop-2.0.5-beta. Thus, we can confidently
 support
  hadoop-2.x in a compatible manner in the future. So, it's fine to add new
  features, but please ensure that all APIs are frozen for
 hadoop-2.0.5-beta
 
   Vinod is helping out on the YARN/MR side and has tagged a number of
 final
  changes (including some the final API incompatibilities) we'd like to
 push
  in before we call hadoop-2.x as ready to be supported (Target Version set
  to 2.0.5-beta):
   http://s.apache.org/target-hadoop-2.0.5-beta
   Thanks Vinod! (Note some of the sub-tasks of umbrella jiras may not be
  tagged, but their necessity is implied).
 
   Similarly on HDFS side, can someone please help out by tagging features,
  bug-fixes, protocol/API changes etc.? This way we can ensure HDFS APIs 
  protocols are locked down too - I'd really appreciate it!
 
  thanks,
  Arun
 
 
  --
  Arun C. Murthy
  Hortonworks Inc.
  http://hortonworks.com/
 
 
 


 --
 http://hortonworks.com/download/



[jira] [Created] (MAPREDUCE-5185) When log aggregation not enabled, message should point to NM HTTP port, not IPC port

2013-04-26 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5185:
-

 Summary: When log aggregation not enabled, message should point to 
NM HTTP port, not IPC port 
 Key: MAPREDUCE-5185
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5185
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza


When I try to get a container's logs in the JHS without log aggregation 
enabled, I get a message that looks like this:
Aggregation is not enabled. Try the nodemanager at sandy-ThinkPad-T530:33224

This could be a lot more helpful by actually pointing the URL that would show 
the container logs on the NM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail

2013-04-26 Thread Sangjin Lee (JIRA)
Sangjin Lee created MAPREDUCE-5186:
--

 Summary: mapreduce.job.max.split.locations causes some splits 
created by CombineFileInputFormat to fail
 Key: MAPREDUCE-5186
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee


CombineFileInputFormat can easily create splits that can come from many 
different locations (during the last pass of creating global splits). 
However, we observe that this often runs afoul of the 
mapreduce.job.max.split.locations check that's done by JobSplitWriter.

The default value for mapreduce.job.max.split.locations is 10, and with any 
decent size cluster, CombineFileInputFormat creates splits that are well above 
this limit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Heads up - 2.0.5-beta

2013-04-26 Thread Arun C Murthy
Konstantin,

On Apr 26, 2013, at 4:34 PM, Konstantin Shvachko wrote:

 Do you think we can call the version you proposed to release
 2.1.0 or 2.1.0-beta?
 
 The proposed new features imho do not exactly conform with the idea
 of dot-dot release, but definitely qualify for a major number change.
 I am just trying to avoid rather ugly 2.0.4.1 versions, which of course
 also possible.

I'm agnostic to the schemes. 

During the long discussion we had just 2 months ago, I proposed that 2.1.x be 
the beta series initially.

The feedback and consensus was that it wasn't the right numbering scheme:
http://s.apache.org/1j4

thanks,
Arun


Re: Heads up - 2.0.5-beta

2013-04-26 Thread Arun C Murthy

On Apr 26, 2013, at 12:07 PM, Eli Collins wrote:

 Arun, Suresh,
 
 Mind reviewing the following page Karthik put together on
 compatibility?   http://wiki.apache.org/hadoop/Compatibility

Sure. Will do.

I just opened https://issues.apache.org/jira/browse/HADOOP-9517 to ensure we 
capture it for posterity.

Karthik - Would you like to take a crack at it? The wiki would be a good 
starting point.

thanks,
Arun

[jira] [Created] (MAPREDUCE-5187) Create mapreduce command scripts on Windows

2013-04-26 Thread Chuan Liu (JIRA)
Chuan Liu created MAPREDUCE-5187:


 Summary: Create mapreduce command scripts on Windows
 Key: MAPREDUCE-5187
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5187
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Chuan Liu
Assignee: Chuan Liu


We don't have mapreduce command scripts, e.g. mapred.cmd, on Windows in trunk 
code base right now. As a result, some import functionality like Job history 
server is not available. This JIRA is created to track this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5158) Cleanup required when mapreduce.job.restart.recover is set to false

2013-04-26 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy resolved MAPREDUCE-5158.
--

   Resolution: Fixed
Fix Version/s: 1.2.0

I just committed this after running affected tests. Thanks Mayank!

 Cleanup required when mapreduce.job.restart.recover is set to false
 ---

 Key: MAPREDUCE-5158
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5158
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1.2.0
Reporter: yeshavora
Assignee: Mayank Bansal
 Fix For: 1.2.0

 Attachments: MAPREDUCE-5158-br1-1.patch, MAPREDUCE-5158-br1.patch


 When mapred.jobtracker.restart.recover is set as true and 
 mapreduce.job.restart.recover is set to false for a MR job, Job clean up 
 never happens for that job if JT restarts while job is running.
 .staging and job-info file for that job remains on HDFS forever. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5188) error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java

2013-04-26 Thread junjin (JIRA)
junjin created MAPREDUCE-5188:
-

 Summary: error when verify FileType of RS_SOURCE in 
getCompanionBlocks  in BlockPlacementPolicyRaid.java
 Key: MAPREDUCE-5188
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5188
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 2.0.2-alpha
Reporter: junjin
Assignee: junjin
Priority: Critical
 Fix For: 2.0.2-alpha


error when verify FileType of RS_SOURCE in getCompanionBlocks  in 
BlockPlacementPolicyRaid.java
need change xorParityLength in line #379 to rsParityLength since it's for 
verifying RS_SOURCE  type

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5189) Basic AM changes to support preemption requests (per YARN-45)

2013-04-26 Thread Carlo Curino (JIRA)
Carlo Curino created MAPREDUCE-5189:
---

 Summary: Basic AM changes to support preemption requests (per 
YARN-45)
 Key: MAPREDUCE-5189
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5189
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino


This JIRA tracks the minimum amount of changes necessary in the mapreduce AM to 
receive preemption requests (per YARN-45) and invoke a local policy that 
manages preemption. (advanced policies and mechanisms will be tracked 
separately)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira