Jenkins build is back to normal : Hadoop-Common-0.23-Build #814
See https://builds.apache.org/job/Hadoop-Common-0.23-Build/814/changes
Re: branch development for HADOOP-9639
+1 for the idea. The branch committership clause was added for exactly this kind of scenario. From the phrasing in the bylaws, it looks like we'll need assistance from PMC to get the ball rolling. Is there a PMC member out there who could volunteer to help start the process with Sangjin? Chris Nauroth Hortonworks http://hortonworks.com/ On Mon, Dec 2, 2013 at 11:47 AM, Sangjin Lee sj...@apache.org wrote: We have been having discussions on HADOOP-9639 (shared cache for jars) and the proposed design there for some time now. We are going to start work on this and have it vetted and reviewed by the community. I have just filed some more implementation JIRAs for this feature: YARN-1465, MAPREDUCE-5662, YARN-1466, YARN-1467 Rather than working privately in our corner and sharing a big patch at the end, I'd like to explore the idea of developing on a branch in the public to foster more public feedback. Recently the Hadoop PMC has passed the change to the bylaws to allow for branch committers ( http://mail-archives.apache.org/mod_mbox/hadoop-general/201307.mbox/%3CCACO5Y4y7HZnn3BS-ZyCVfv-UBcMudeQhndr2vqg%3DXqE1oBiQvQ%40mail.gmail.com%3E ), and I think it would be a good model for this development. I'd like to propose a branch development and a branch committer status for a couple of us who are going to work on this per bylaw. Could you please let me know what you think? Thanks, Sangjin -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: branch development for HADOOP-9639
Chris, I'm already on it. Thanks. On Fri, Dec 6, 2013 at 9:49 AM, Chris Nauroth cnaur...@hortonworks.comwrote: +1 for the idea. The branch committership clause was added for exactly this kind of scenario. From the phrasing in the bylaws, it looks like we'll need assistance from PMC to get the ball rolling. Is there a PMC member out there who could volunteer to help start the process with Sangjin? Chris Nauroth Hortonworks http://hortonworks.com/ On Mon, Dec 2, 2013 at 11:47 AM, Sangjin Lee sj...@apache.org wrote: We have been having discussions on HADOOP-9639 (shared cache for jars) and the proposed design there for some time now. We are going to start work on this and have it vetted and reviewed by the community. I have just filed some more implementation JIRAs for this feature: YARN-1465, MAPREDUCE-5662, YARN-1466, YARN-1467 Rather than working privately in our corner and sharing a big patch at the end, I'd like to explore the idea of developing on a branch in the public to foster more public feedback. Recently the Hadoop PMC has passed the change to the bylaws to allow for branch committers ( http://mail-archives.apache.org/mod_mbox/hadoop-general/201307.mbox/%3CCACO5Y4y7HZnn3BS-ZyCVfv-UBcMudeQhndr2vqg%3DXqE1oBiQvQ%40mail.gmail.com%3E ), and I think it would be a good model for this development. I'd like to propose a branch development and a branch committer status for a couple of us who are going to work on this per bylaw. Could you please let me know what you think? Thanks, Sangjin -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro
Re: branch development for HADOOP-9639
+1good idea Thanks for contributing Sangjin. On Mon, Dec 2, 2013 at 11:47 AM, Sangjin Lee sj...@apache.org wrote: We have been having discussions on HADOOP-9639 (shared cache for jars) and the proposed design there for some time now. We are going to start work on this and have it vetted and reviewed by the community. I have just filed some more implementation JIRAs for this feature: YARN-1465, MAPREDUCE-5662, YARN-1466, YARN-1467 Rather than working privately in our corner and sharing a big patch at the end, I'd like to explore the idea of developing on a branch in the public to foster more public feedback. Recently the Hadoop PMC has passed the change to the bylaws to allow for branch committers ( http://mail-archives.apache.org/mod_mbox/hadoop-general/201307.mbox/%3CCACO5Y4y7HZnn3BS-ZyCVfv-UBcMudeQhndr2vqg%3DXqE1oBiQvQ%40mail.gmail.com%3E), and I think it would be a good model for this development. I'd like to propose a branch development and a branch committer status for a couple of us who are going to work on this per bylaw. Could you please let me know what you think? Thanks, Sangjin
[jira] [Created] (HADOOP-10147) Upgrade to commons-logging 1.1.3 to avoid potential deadlock in MiniDFSCluster
Eric Sirianni created HADOOP-10147: -- Summary: Upgrade to commons-logging 1.1.3 to avoid potential deadlock in MiniDFSCluster Key: HADOOP-10147 URL: https://issues.apache.org/jira/browse/HADOOP-10147 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 2.2.0 Reporter: Eric Sirianni Priority: Minor There is a deadlock in commons-logging 1.1.1 (see LOGGING-119) that can manifest itself while running {{MiniDFSCluster}} JUnit tests. This deadlock has been fixed in commons-logging 1.1.2. The latest version available is commons-logging 1.1.3, and Hadoop should upgrade to that in order to address this deadlock. -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: Next releases
If 2.4 is released in January, I think it's very unlikely to include symlinks. There is still a lot of work to be done before they're usable. You can look at the progress on HADOOP-10019. For some of the subtasks, it will require some community discussion before any code can be written. For better or worse, symlinks have not been requested by users as often as features like NFS export, HDFS caching, ACLs, etc, so effort has been focused on those instead. For now, I think we should put the symlinks-disabling patches (HADOOP-10020, etc) into branch-2, so that they will be part of the next releases without additional effort. I would like to see HDFS caching make it into 2.4. The APIs and implementation are beginning to stabilize, and around January it should be ok to backport to a stable branch. best, Colin On Thu, Dec 5, 2013 at 3:57 PM, Arun C Murthy a...@hortonworks.com wrote: Ok, I've updated https://wiki.apache.org/hadoop/Roadmap with a initial strawman list for hadoop-2.4 which I feel we can get out in Jan. What else would folks like to see? Please keep timeframe in mind. thanks, Arun On Dec 2, 2013, at 10:55 AM, Arun C Murthy a...@hortonworks.com wrote: On Nov 13, 2013, at 1:55 PM, Jason Lowe jl...@yahoo-inc.com wrote: +1 to limiting checkins of patch releases to Blockers/Criticals. If necessary committers check into trunk/branch-2 only and defer to the patch release manager for the patch release merge. Then there should be fewer surprises for everyone what ended up in a patch release and less likely the patch release becomes destabilized from the sheer amount of code churn. Maybe this won't be necessary if everyone understands that the patch release isn't the only way to get a change out in timely manner. I've updated https://wiki.apache.org/hadoop/Roadmap to reflect that we only put in Blocker/Critical bugs into Point Releases. Committers, from now, please exercise extreme caution when committing to a point release: they should only be limited to Blocker bugs. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Next releases
Thanks Suresh Colin. Please update the Roadmap wiki with your proposals. As always, we will try our best to get these in - but we can collectively decide to slip some of these to subsequent releases based on timelines. Arun On Dec 6, 2013, at 10:43 AM, Suresh Srinivas sur...@hortonworks.com wrote: Arun, I propose the following changes for 2.3: - There have been a lot of improvements related to supporting http policy. - There is a still discussion going on, but I would like to deprecate BackupNode in 2.3 as well. - We are currently working on rolling upgrades related change in HDFS. We might add a couple of changes that enables rolling upgrades from 2.3 onwards (hopefully we can this done by December) I propose the following for 2.4 release, if they are tested and stable: - Heterogeneous storage support - HDFS-2832 - Datanode cache related change - HDFS-4949 - HDFS ACLs - HDFS-4685 - Rolling upgrade changes Let me know if you want me to update the wiki. Regards, Suresh On Dec 6, 2013, at 12:27 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: If 2.4 is released in January, I think it's very unlikely to include symlinks. There is still a lot of work to be done before they're usable. You can look at the progress on HADOOP-10019. For some of the subtasks, it will require some community discussion before any code can be written. For better or worse, symlinks have not been requested by users as often as features like NFS export, HDFS caching, ACLs, etc, so effort has been focused on those instead. For now, I think we should put the symlinks-disabling patches (HADOOP-10020, etc) into branch-2, so that they will be part of the next releases without additional effort. I would like to see HDFS caching make it into 2.4. The APIs and implementation are beginning to stabilize, and around January it should be ok to backport to a stable branch. best, Colin On Thu, Nov 7, 2013 at 6:42 PM, Arun C Murthy a...@hortonworks.com wrote: Gang, Thinking through the next couple of releases here, appreciate f/b. # hadoop-2.2.1 I was looking through commit logs and there is a *lot* of content here (81 commits as on 11/7). Some are features/improvements and some are fixes - it's really hard to distinguish what is important and what isn't. I propose we start with a blank slate (i.e. blow away branch-2.2 and start fresh from a copy of branch-2.2.0) and then be very careful and meticulous about including only *blocker* fixes in branch-2.2. So, most of the content here comes via the next minor release (i.e. hadoop-2.3) In future, we continue to be *very* parsimonious about what gets into a patch release (major.minor.patch) - in general, these should be only *blocker* fixes or key operational issues. # hadoop-2.3 I'd like to propose the following features for YARN/MR to make it into hadoop-2.3 and punt the rest to hadoop-2.4 and beyond: * Application History Server - This is happening in a branch and is close; with it we can provide a reasonable experience for new frameworks being built on top of YARN. * Bug-fixes in RM Restart * Minimal support for long-running applications (e.g. security) via YARN-896 * RM Fail-over via ZKFC * Anything else? HDFS??? Overall, I feel like we have a decent chance of rolling hadoop-2.3 by the end of the year. Thoughts? thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure
Re: [VOTE] Release Apache Hadoop 0.23.10
+1 (non-binding) Environment: Fedora17, protobuf 2.4.1, JDK 1.7.0_45 -successful built from source -successful native build -deployed source build to my single-node cluster and ran multiple instances of wordcount -verified signature and digests - M!T On 12/3/13, 12:22 AM, Thomas Graves tgra...@yahoo-inc.com wrote: Hey Everyone, There have been lots of improvements and bug fixes that have went into branch-0.23 since the 0.23.9 release. We think its time to do a 0.23.10 so I have created a release candidate (rc0) for a Hadoop-0.23.10 release. The RC is available at: http://people.apache.org/~tgraves/hadoop-0.23.10-rc0/ The RC Tag in svn is here: http://svn.apache.org/viewvc/hadoop/common/tags/release-0.23.10-rc0/ The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days til December 9th. I am +1 (binding). thanks, Tom Graves
Re: [VOTE] Merge HDFS-2832 Heterogeneous Storage Phase 1 to trunk
Hi everyone, I'm still getting up to speed on the changes here (my fault for not following development more closely, other priorities etc etc), but the branch thus far is already quite impressive. It's quite an undertaking to turn the DN into a collection of Storages, along with the corresponding datastructure, tracking, and other changes in the NN and DN. Correct me if I'm wrong though, but this still leaves a substantial part of the design doc to be implemented. Looking at the list of remaining subtasks, it seems like we still can't specify a storage type for a file (HDFS-5229) or write a file to a given storage type (HDFS-5391), along with the corresponding client protocol changes. This leads me to two questions: - If this is merged, what can I do with the new code? Without client changes or the ability to create a file on a different storage type, I don't know how (for example) I could hand this to our QA team to test. I'm wondering why we want to merge now rather than when the branch is more feature complete. - What's the plan for the implementation of the remaining features? How many phases? What's the timeline for these phases? Particularly, related to the use cases presented in section 2 of the design doc. I'm also going to post some design doc questions to the JIRA, there are a few technical q's I'd like to get clarification on. Thanks, Andrew On Wed, Dec 4, 2013 at 7:21 AM, Sirianni, Eric eric.siria...@netapp.comwrote: +1 My team has been developing and testing against the HDFS-2832 branch for the past month. It has proven to be quite stable. Eric -Original Message- From: Arpit Agarwal [mailto:aagar...@hortonworks.com] Sent: Monday, December 02, 2013 7:07 PM To: hdfs-...@hadoop.apache.org; common-dev@hadoop.apache.org Subject: [VOTE] Merge HDFS-2832 Heterogeneous Storage Phase 1 to trunk Hello all, I would like to call a vote to merge phase 1 of the Heterogeneous Storage feature into trunk. *Scope of the changes:* The changes allow exposing the DataNode as a collection of storages and set the foundation for subsequent work to present Heterogeneous Storages to applications. This allows DataNodes to send block and storage reports per-storage. In addition this change introduces the ability to add a 'storage type' tag to the storage directories. This enables supporting different types of storages in addition to disk storage. Development of the feature is tracked in the jira https://issues.apache.org/jira/browse/HDFS-2832. *Details of development and testing:* Development has been done in a separate branch - https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2832. The updated design is posted at - https://issues.apache.org/jira/secure/attachment/12615761/20131125-HeterogeneousStorage.pdf . The changes involve ~6K changed lines of code, with a third of those changes being to tests. Please see the test plan https://issues.apache.org/jira/secure/attachment/12616642/20131202-HeterogeneousStorage-TestPlan.pdffor the details. Once the feature is merged into trunk, we will continue to test and fix any bugs that may be found on trunk as well as add further tests as outlined in the test plan. The bulk of the design and implementation was done by Suresh Srinivas, Sanjay Radia, Nicholas Sze, Junping Du and me. Also, thanks to Eric Sirianni, Chris Nauroth, Steve Loughran, Bikas Saha, Andrew Wang and Todd Lipcon for providing feedback on the Jiras and in discussions. This vote runs for a week and closes on 12/9/2013 at 11:59 pm PT. Thanks, Arpit -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Created] (HADOOP-10149) Create ByteBuffer-based cipher API
Owen O'Malley created HADOOP-10149: -- Summary: Create ByteBuffer-based cipher API Key: HADOOP-10149 URL: https://issues.apache.org/jira/browse/HADOOP-10149 Project: Hadoop Common Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley As part of HDFS-5143, [~hitliuyi] included a ByteBuffer-based API for encryption and decryption. Especially, because of the zero-copy work this seems like an important piece of work. This API should be discussed independently instead of just as part of HDFS-5143. -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: [VOTE] Merge HDFS-2832 Heterogeneous Storage Phase 1 to trunk
Hi Andrew, Our plan as stated back in August was to do this work principally in two phases. https://issues.apache.org/jira/browse/HDFS-2832?focusedCommentId=13739041 For the second phase which includes API support, we also need quota management. For changes of this scope, to do all the work at once while keeping the feature branch in sync with ongoing development in trunk is unmanageable. Hence we'd like to stick with the initial plan and develop in phases. Even for datanode caching the initial merge did not include the quota management changes which are happening subsequently. Going forward, we will stabilize the current changes in trunk in the 2.4 time frame. Next we will add quota management and API support which can align with the 2.5 time frame, with the second merge potentially in March/April. Arpit On Fri, Dec 6, 2013 at 3:15 PM, Andrew Wang andrew.w...@cloudera.comwrote: Hi everyone, I'm still getting up to speed on the changes here (my fault for not following development more closely, other priorities etc etc), but the branch thus far is already quite impressive. It's quite an undertaking to turn the DN into a collection of Storages, along with the corresponding datastructure, tracking, and other changes in the NN and DN. Correct me if I'm wrong though, but this still leaves a substantial part of the design doc to be implemented. Looking at the list of remaining subtasks, it seems like we still can't specify a storage type for a file (HDFS-5229) or write a file to a given storage type (HDFS-5391), along with the corresponding client protocol changes. This leads me to two questions: - If this is merged, what can I do with the new code? Without client changes or the ability to create a file on a different storage type, I don't know how (for example) I could hand this to our QA team to test. I'm wondering why we want to merge now rather than when the branch is more feature complete. - What's the plan for the implementation of the remaining features? How many phases? What's the timeline for these phases? Particularly, related to the use cases presented in section 2 of the design doc. I'm also going to post some design doc questions to the JIRA, there are a few technical q's I'd like to get clarification on. Thanks, Andrew On Wed, Dec 4, 2013 at 7:21 AM, Sirianni, Eric eric.siria...@netapp.com wrote: +1 My team has been developing and testing against the HDFS-2832 branch for the past month. It has proven to be quite stable. Eric -Original Message- From: Arpit Agarwal [mailto:aagar...@hortonworks.com] Sent: Monday, December 02, 2013 7:07 PM To: hdfs-...@hadoop.apache.org; common-dev@hadoop.apache.org Subject: [VOTE] Merge HDFS-2832 Heterogeneous Storage Phase 1 to trunk Hello all, I would like to call a vote to merge phase 1 of the Heterogeneous Storage feature into trunk. *Scope of the changes:* The changes allow exposing the DataNode as a collection of storages and set the foundation for subsequent work to present Heterogeneous Storages to applications. This allows DataNodes to send block and storage reports per-storage. In addition this change introduces the ability to add a 'storage type' tag to the storage directories. This enables supporting different types of storages in addition to disk storage. Development of the feature is tracked in the jira https://issues.apache.org/jira/browse/HDFS-2832. *Details of development and testing:* Development has been done in a separate branch - https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2832. The updated design is posted at - https://issues.apache.org/jira/secure/attachment/12615761/20131125-HeterogeneousStorage.pdf . The changes involve ~6K changed lines of code, with a third of those changes being to tests. Please see the test plan https://issues.apache.org/jira/secure/attachment/12616642/20131202-HeterogeneousStorage-TestPlan.pdffor the details. Once the feature is merged into trunk, we will continue to test and fix any bugs that may be found on trunk as well as add further tests as outlined in the test plan. The bulk of the design and implementation was done by Suresh Srinivas, Sanjay Radia, Nicholas Sze, Junping Du and me. Also, thanks to Eric Sirianni, Chris Nauroth, Steve Loughran, Bikas Saha, Andrew Wang and Todd Lipcon for providing feedback on the Jiras and in discussions. This vote runs for a week and closes on 12/9/2013 at 11:59 pm PT. Thanks, Arpit -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this