Jenkins build is back to normal : Hadoop-Common-0.23-Build #814

2013-12-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Common-0.23-Build/814/changes



Re: branch development for HADOOP-9639

2013-12-06 Thread Chris Nauroth
+1 for the idea.  The branch committership clause was added for exactly
this kind of scenario.

From the phrasing in the bylaws, it looks like we'll need assistance from
PMC to get the ball rolling.  Is there a PMC member out there who could
volunteer to help start the process with Sangjin?

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Mon, Dec 2, 2013 at 11:47 AM, Sangjin Lee sj...@apache.org wrote:

 We have been having discussions on HADOOP-9639 (shared cache for jars) and
 the proposed design there for some time now. We are going to start work on
 this and have it vetted and reviewed by the community. I have just filed
 some more implementation JIRAs for this feature: YARN-1465, MAPREDUCE-5662,
 YARN-1466, YARN-1467

 Rather than working privately in our corner and sharing a big patch at the
 end, I'd like to explore the idea of developing on a branch in the public
 to foster more public feedback. Recently the Hadoop PMC has passed the
 change to the bylaws to allow for branch committers (

 http://mail-archives.apache.org/mod_mbox/hadoop-general/201307.mbox/%3CCACO5Y4y7HZnn3BS-ZyCVfv-UBcMudeQhndr2vqg%3DXqE1oBiQvQ%40mail.gmail.com%3E
 ),
 and I think it would be a good model for this development.

 I'd like to propose a branch development and a branch committer status for
 a couple of us who are going to work on this per bylaw. Could you please
 let me know what you think?

  Thanks,
 Sangjin


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: branch development for HADOOP-9639

2013-12-06 Thread Alejandro Abdelnur
Chris,

I'm already on it.

Thanks.


On Fri, Dec 6, 2013 at 9:49 AM, Chris Nauroth cnaur...@hortonworks.comwrote:

 +1 for the idea.  The branch committership clause was added for exactly
 this kind of scenario.

 From the phrasing in the bylaws, it looks like we'll need assistance from
 PMC to get the ball rolling.  Is there a PMC member out there who could
 volunteer to help start the process with Sangjin?

 Chris Nauroth
 Hortonworks
 http://hortonworks.com/



 On Mon, Dec 2, 2013 at 11:47 AM, Sangjin Lee sj...@apache.org wrote:

  We have been having discussions on HADOOP-9639 (shared cache for jars)
 and
  the proposed design there for some time now. We are going to start work
 on
  this and have it vetted and reviewed by the community. I have just filed
  some more implementation JIRAs for this feature: YARN-1465,
 MAPREDUCE-5662,
  YARN-1466, YARN-1467
 
  Rather than working privately in our corner and sharing a big patch at
 the
  end, I'd like to explore the idea of developing on a branch in the public
  to foster more public feedback. Recently the Hadoop PMC has passed the
  change to the bylaws to allow for branch committers (
 
 
 http://mail-archives.apache.org/mod_mbox/hadoop-general/201307.mbox/%3CCACO5Y4y7HZnn3BS-ZyCVfv-UBcMudeQhndr2vqg%3DXqE1oBiQvQ%40mail.gmail.com%3E
  ),
  and I think it would be a good model for this development.
 
  I'd like to propose a branch development and a branch committer status
 for
  a couple of us who are going to work on this per bylaw. Could you please
  let me know what you think?
 
   Thanks,
  Sangjin
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Alejandro


Re: branch development for HADOOP-9639

2013-12-06 Thread Eli Collins
+1good idea

Thanks for contributing Sangjin.


On Mon, Dec 2, 2013 at 11:47 AM, Sangjin Lee sj...@apache.org wrote:
 We have been having discussions on HADOOP-9639 (shared cache for jars) and
 the proposed design there for some time now. We are going to start work on
 this and have it vetted and reviewed by the community. I have just filed
 some more implementation JIRAs for this feature: YARN-1465, MAPREDUCE-5662,
 YARN-1466, YARN-1467

 Rather than working privately in our corner and sharing a big patch at the
 end, I'd like to explore the idea of developing on a branch in the public
 to foster more public feedback. Recently the Hadoop PMC has passed the
 change to the bylaws to allow for branch committers (
 http://mail-archives.apache.org/mod_mbox/hadoop-general/201307.mbox/%3CCACO5Y4y7HZnn3BS-ZyCVfv-UBcMudeQhndr2vqg%3DXqE1oBiQvQ%40mail.gmail.com%3E),
 and I think it would be a good model for this development.

 I'd like to propose a branch development and a branch committer status for
 a couple of us who are going to work on this per bylaw. Could you please
 let me know what you think?

  Thanks,
 Sangjin


[jira] [Created] (HADOOP-10147) Upgrade to commons-logging 1.1.3 to avoid potential deadlock in MiniDFSCluster

2013-12-06 Thread Eric Sirianni (JIRA)
Eric Sirianni created HADOOP-10147:
--

 Summary: Upgrade to commons-logging 1.1.3 to avoid potential 
deadlock in MiniDFSCluster
 Key: HADOOP-10147
 URL: https://issues.apache.org/jira/browse/HADOOP-10147
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 2.2.0
Reporter: Eric Sirianni
Priority: Minor


There is a deadlock in commons-logging 1.1.1 (see LOGGING-119) that can 
manifest itself while running {{MiniDFSCluster}} JUnit tests.

This deadlock has been fixed in commons-logging 1.1.2.  The latest version 
available is commons-logging 1.1.3, and Hadoop should upgrade to that in order 
to address this deadlock.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Next releases

2013-12-06 Thread Colin McCabe
If 2.4 is released in January, I think it's very unlikely to include
symlinks.  There is still a lot of work to be done before they're
usable.  You can look at the progress on HADOOP-10019.  For some of
the subtasks, it will require some community discussion before any
code can be written.

For better or worse, symlinks have not been requested by users as
often as features like NFS export, HDFS caching, ACLs, etc, so effort
has been focused on those instead.

For now, I think we should put the symlinks-disabling patches
(HADOOP-10020, etc) into branch-2, so that they will be part of the
next releases without additional effort.

I would like to see HDFS caching make it into 2.4.  The APIs and
implementation are beginning to stabilize, and around January it
should be ok to backport to a stable branch.

best,
Colin

On Thu, Dec 5, 2013 at 3:57 PM, Arun C Murthy a...@hortonworks.com wrote:
 Ok, I've updated https://wiki.apache.org/hadoop/Roadmap with a initial 
 strawman list for hadoop-2.4 which I feel we can get out in Jan.

 What else would folks like to see? Please keep timeframe in mind.

 thanks,
 Arun

 On Dec 2, 2013, at 10:55 AM, Arun C Murthy a...@hortonworks.com wrote:


 On Nov 13, 2013, at 1:55 PM, Jason Lowe jl...@yahoo-inc.com wrote:


 +1 to limiting checkins of patch releases to Blockers/Criticals.  If 
 necessary committers check into trunk/branch-2 only and defer to the patch 
 release manager for the patch release merge.  Then there should be fewer 
 surprises for everyone what ended up in a patch release and less likely the 
 patch release becomes destabilized from the sheer amount of code churn.  
 Maybe this won't be necessary if everyone understands that the patch 
 release isn't the only way to get a change out in timely manner.

 I've updated https://wiki.apache.org/hadoop/Roadmap to reflect that we only 
 put in Blocker/Critical bugs into Point Releases.

 Committers, from now, please exercise extreme caution when committing to a 
 point release: they should only be limited to Blocker bugs.

 thanks,
 Arun


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: Next releases

2013-12-06 Thread Arun C Murthy
Thanks Suresh  Colin.

Please update the Roadmap wiki with your proposals.

As always, we will try our best to get these in - but we can collectively 
decide to slip some of these to subsequent releases based on timelines.

Arun

On Dec 6, 2013, at 10:43 AM, Suresh Srinivas sur...@hortonworks.com wrote:

 Arun,
 
 I propose the following changes for 2.3:
 - There have been a lot of improvements related to supporting http policy.
 - There is a still discussion going on, but I would like to deprecate
 BackupNode in 2.3 as well.
 - We are currently working on rolling upgrades related change in HDFS. We
 might add a couple of changes that enables rolling upgrades from 2.3
 onwards (hopefully we can this done by December)
 
 I propose the following for 2.4 release, if they are tested and stable:
 - Heterogeneous storage support - HDFS-2832
 - Datanode cache related change - HDFS-4949
 - HDFS ACLs - HDFS-4685
 - Rolling upgrade changes
 
 Let me know if you want me to update the wiki.
 
 Regards,
 Suresh
 

On Dec 6, 2013, at 12:27 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote:

 If 2.4 is released in January, I think it's very unlikely to include
 symlinks.  There is still a lot of work to be done before they're
 usable.  You can look at the progress on HADOOP-10019.  For some of
 the subtasks, it will require some community discussion before any
 code can be written.
 
 For better or worse, symlinks have not been requested by users as
 often as features like NFS export, HDFS caching, ACLs, etc, so effort
 has been focused on those instead.
 
 For now, I think we should put the symlinks-disabling patches
 (HADOOP-10020, etc) into branch-2, so that they will be part of the
 next releases without additional effort.
 
 I would like to see HDFS caching make it into 2.4.  The APIs and
 implementation are beginning to stabilize, and around January it
 should be ok to backport to a stable branch.
 
 best,
 Colin
 
 
 On Thu, Nov 7, 2013 at 6:42 PM, Arun C Murthy a...@hortonworks.com wrote:
 
 Gang,
 
 Thinking through the next couple of releases here, appreciate f/b.
 
 # hadoop-2.2.1
 
 I was looking through commit logs and there is a *lot* of content here
 (81 commits as on 11/7). Some are features/improvements and some are fixes
 - it's really hard to distinguish what is important and what isn't.
 
 I propose we start with a blank slate (i.e. blow away branch-2.2 and
 start fresh from a copy of branch-2.2.0)  and then be very careful and
 meticulous about including only *blocker* fixes in branch-2.2. So, most of
 the content here comes via the next minor release (i.e. hadoop-2.3)
 
 In future, we continue to be *very* parsimonious about what gets into a
 patch release (major.minor.patch) - in general, these should be only
 *blocker* fixes or key operational issues.
 
 # hadoop-2.3
 
 I'd like to propose the following features for YARN/MR to make it into
 hadoop-2.3 and punt the rest to hadoop-2.4 and beyond:
 * Application History Server - This is happening in  a branch and is
 close; with it we can provide a reasonable experience for new frameworks
 being built on top of YARN.
 * Bug-fixes in RM Restart
 * Minimal support for long-running applications (e.g. security) via
 YARN-896
 * RM Fail-over via ZKFC
 * Anything else?
 
 HDFS???
 
 Overall, I feel like we have a decent chance of rolling hadoop-2.3 by the
 end of the year.
 
 Thoughts?
 
 thanks,
 Arun
 
 
 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/
 
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.
 
 
 
 
 -- 
 http://hortonworks.com/download/
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure 

Re: [VOTE] Release Apache Hadoop 0.23.10

2013-12-06 Thread Mit Desai
+1 (non-binding)

Environment: Fedora17, protobuf 2.4.1, JDK 1.7.0_45

-successful built from source
-successful native build
-deployed source build to my single-node cluster and ran multiple
instances of wordcount
-verified signature and digests



- M!T





On 12/3/13, 12:22 AM, Thomas Graves tgra...@yahoo-inc.com wrote:

Hey Everyone,

There have been lots of improvements and bug fixes that have went into
branch-0.23 since the 0.23.9 release.  We think its time to do a 0.23.10
so I have created a release candidate (rc0) for a Hadoop-0.23.10 release.

The RC is available at:
http://people.apache.org/~tgraves/hadoop-0.23.10-rc0/


The RC Tag in svn is here:
http://svn.apache.org/viewvc/hadoop/common/tags/release-0.23.10-rc0/


The maven artifacts are available via repository.apache.org.


Please try the release and vote; the vote will run for the usual 7 days
til December 9th.

I am +1 (binding).

thanks,
Tom Graves




Re: [VOTE] Merge HDFS-2832 Heterogeneous Storage Phase 1 to trunk

2013-12-06 Thread Andrew Wang
Hi everyone,

I'm still getting up to speed on the changes here (my fault for not
following development more closely, other priorities etc etc), but the
branch thus far is already quite impressive. It's quite an undertaking to
turn the DN into a collection of Storages, along with the corresponding
datastructure, tracking, and other changes in the NN and DN.

Correct me if I'm wrong though, but this still leaves a substantial part of
the design doc to be implemented. Looking at the list of remaining
subtasks, it seems like we still can't specify a storage type for a file
(HDFS-5229) or write a file to a given storage type (HDFS-5391), along with
the corresponding client protocol changes. This leads me to two questions:

- If this is merged, what can I do with the new code? Without client
changes or the ability to create a file on a different storage type, I
don't know how (for example) I could hand this to our QA team to test. I'm
wondering why we want to merge now rather than when the branch is more
feature complete.
- What's the plan for the implementation of the remaining features? How
many phases? What's the timeline for these phases? Particularly, related to
the use cases presented in section 2 of the design doc.

I'm also going to post some design doc questions to the JIRA, there are a
few technical q's I'd like to get clarification on.

Thanks,
Andrew


On Wed, Dec 4, 2013 at 7:21 AM, Sirianni, Eric eric.siria...@netapp.comwrote:

 +1

 My team has been developing and testing against the HDFS-2832 branch for
 the past month.  It has proven to be quite stable.

 Eric

 -Original Message-
 From: Arpit Agarwal [mailto:aagar...@hortonworks.com]
 Sent: Monday, December 02, 2013 7:07 PM
 To: hdfs-...@hadoop.apache.org; common-dev@hadoop.apache.org
 Subject: [VOTE] Merge HDFS-2832 Heterogeneous Storage Phase 1 to trunk

 Hello all,

 I would like to call a vote to merge phase 1 of the Heterogeneous Storage
 feature into trunk.

 *Scope of the changes:*
 The changes allow exposing the DataNode as a collection of storages and set
 the foundation for subsequent work to present Heterogeneous Storages to
 applications. This allows DataNodes to send block and storage reports
 per-storage. In addition this change introduces the ability to add a
 'storage type' tag to the storage directories. This enables supporting
 different types of storages in addition to disk storage.

 Development of the feature is tracked in the jira
 https://issues.apache.org/jira/browse/HDFS-2832.

 *Details of development and testing:*
 Development has been done in a separate branch -
 https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2832. The
 updated design is posted at -

 https://issues.apache.org/jira/secure/attachment/12615761/20131125-HeterogeneousStorage.pdf
 .
 The changes involve ~6K changed lines of code, with a third of those
 changes being to tests.

 Please see the test plan

 https://issues.apache.org/jira/secure/attachment/12616642/20131202-HeterogeneousStorage-TestPlan.pdffor
 the details. Once the feature is
 merged into trunk, we will continue to test and fix any bugs that may be
 found on trunk as well as add further tests as outlined in the test plan.

 The bulk of the design and implementation was done by Suresh Srinivas,
 Sanjay Radia, Nicholas Sze, Junping Du and me. Also, thanks to Eric
 Sirianni, Chris Nauroth, Steve Loughran, Bikas Saha, Andrew Wang and Todd
 Lipcon for providing feedback on the Jiras and in discussions.

 This vote runs for a week and closes on 12/9/2013 at 11:59 pm PT.

 Thanks,
 Arpit

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



[jira] [Created] (HADOOP-10149) Create ByteBuffer-based cipher API

2013-12-06 Thread Owen O'Malley (JIRA)
Owen O'Malley created HADOOP-10149:
--

 Summary: Create ByteBuffer-based cipher API
 Key: HADOOP-10149
 URL: https://issues.apache.org/jira/browse/HADOOP-10149
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


As part of HDFS-5143, [~hitliuyi] included a ByteBuffer-based API for 
encryption and decryption. Especially, because of the zero-copy work this seems 
like an important piece of work. 

This API should be discussed independently instead of just as part of HDFS-5143.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: [VOTE] Merge HDFS-2832 Heterogeneous Storage Phase 1 to trunk

2013-12-06 Thread Arpit Agarwal
Hi Andrew,

Our plan as stated back in August was to do this work principally in two
phases.
https://issues.apache.org/jira/browse/HDFS-2832?focusedCommentId=13739041

For the second phase which includes API support, we also need quota
management. For changes of this scope, to do all the work at once while
keeping the feature branch in sync with ongoing development in trunk is
unmanageable. Hence we'd like to stick with the initial plan and develop in
phases.

Even for datanode caching the initial merge did not include the quota
management changes which are happening subsequently.

Going forward, we will stabilize the current changes in trunk in the 2.4
time frame. Next we will add quota management and API support which can
align with the 2.5 time frame, with the second merge potentially in
March/April.

Arpit


On Fri, Dec 6, 2013 at 3:15 PM, Andrew Wang andrew.w...@cloudera.comwrote:

 Hi everyone,

 I'm still getting up to speed on the changes here (my fault for not
 following development more closely, other priorities etc etc), but the
 branch thus far is already quite impressive. It's quite an undertaking to
 turn the DN into a collection of Storages, along with the corresponding
 datastructure, tracking, and other changes in the NN and DN.

 Correct me if I'm wrong though, but this still leaves a substantial part of
 the design doc to be implemented. Looking at the list of remaining
 subtasks, it seems like we still can't specify a storage type for a file
 (HDFS-5229) or write a file to a given storage type (HDFS-5391), along with
 the corresponding client protocol changes. This leads me to two questions:

 - If this is merged, what can I do with the new code? Without client
 changes or the ability to create a file on a different storage type, I
 don't know how (for example) I could hand this to our QA team to test. I'm
 wondering why we want to merge now rather than when the branch is more
 feature complete.
 - What's the plan for the implementation of the remaining features? How
 many phases? What's the timeline for these phases? Particularly, related to
 the use cases presented in section 2 of the design doc.

 I'm also going to post some design doc questions to the JIRA, there are a
 few technical q's I'd like to get clarification on.

 Thanks,
 Andrew


 On Wed, Dec 4, 2013 at 7:21 AM, Sirianni, Eric eric.siria...@netapp.com
 wrote:

  +1
 
  My team has been developing and testing against the HDFS-2832 branch for
  the past month.  It has proven to be quite stable.
 
  Eric
 
  -Original Message-
  From: Arpit Agarwal [mailto:aagar...@hortonworks.com]
  Sent: Monday, December 02, 2013 7:07 PM
  To: hdfs-...@hadoop.apache.org; common-dev@hadoop.apache.org
  Subject: [VOTE] Merge HDFS-2832 Heterogeneous Storage Phase 1 to trunk
 
  Hello all,
 
  I would like to call a vote to merge phase 1 of the Heterogeneous Storage
  feature into trunk.
 
  *Scope of the changes:*
  The changes allow exposing the DataNode as a collection of storages and
 set
  the foundation for subsequent work to present Heterogeneous Storages to
  applications. This allows DataNodes to send block and storage reports
  per-storage. In addition this change introduces the ability to add a
  'storage type' tag to the storage directories. This enables supporting
  different types of storages in addition to disk storage.
 
  Development of the feature is tracked in the jira
  https://issues.apache.org/jira/browse/HDFS-2832.
 
  *Details of development and testing:*
  Development has been done in a separate branch -
  https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2832. The
  updated design is posted at -
 
 
 https://issues.apache.org/jira/secure/attachment/12615761/20131125-HeterogeneousStorage.pdf
  .
  The changes involve ~6K changed lines of code, with a third of those
  changes being to tests.
 
  Please see the test plan
 
 
 https://issues.apache.org/jira/secure/attachment/12616642/20131202-HeterogeneousStorage-TestPlan.pdffor
  the details. Once the feature is
  merged into trunk, we will continue to test and fix any bugs that may be
  found on trunk as well as add further tests as outlined in the test plan.
 
  The bulk of the design and implementation was done by Suresh Srinivas,
  Sanjay Radia, Nicholas Sze, Junping Du and me. Also, thanks to Eric
  Sirianni, Chris Nauroth, Steve Loughran, Bikas Saha, Andrew Wang and Todd
  Lipcon for providing feedback on the Jiras and in discussions.
 
  This vote runs for a week and closes on 12/9/2013 at 11:59 pm PT.
 
  Thanks,
  Arpit
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this