Re: [PROPOSAL] Accumulo for the Apache Incubator
On 04/09/11 17:39, Billie J Rinaldi wrote: Bernd, We would divide the derived code into two categories: that which we modified only slightly (for example to allow us to extend it) and that which we modified heavily. Now that we are able to interact openly, we hope to supply much of that back to the original projects. There is a detailed overview below. We identified these by searching for copyright in our code. The total count came to just over 14,000 lines. We use heavily as a qualitative assessment of how much we modified, but we could certainly come up with quantitative assessments. 5400 lines: slightly modified versions of Hadoop BCFile and related classes (our current file format extends BCFile) 4300 lines: heavily modified versions of MapFile and SequenceFile (no longer our default file format, but still included for backward compatibility) Internal compatibility or external? If internal only I'd keep that out of the public codebase. 2000 lines: heavily modified versions of HBase BlockCache and related files (Adam didn't count the tests when he said 1500 lines) +1 for more tests. 1300 lines: heavily modified versions of Hadoop BloomFilters -any plan to contribute back to hadoop-core, or are they too incompatible now? 419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo 325 lines: our Value is an immutable version of Hadoop BytesWritable -any plan to contribute back to hadoop-core? 142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader classloaders scare me. If we had an ASF-certified-classloader-hacker proposal where only approved people could write CLs for ASF code I'd be +1 for it, even though I'd fail the test myself. I understand why you've forked off your own versions of some of the Hadoop and HBase core -it is not only your right, it gets the changes in on your schedule. I have been known to do this myself. Ideally those thing have to get back to a (future) version of Hadoop, which people like Doug and Owen can help with. Having forked code in the ASF codebase is something to avoid. Again, I speak from experience. I think the proposal ought to consider how they fit in with BigTop too, so it can be part of the full apache hadoop stack deploy/test process. I also think that the roadmap for the system may want to think about MR-279 integration; would that architecture be a better way to run Accumulo code within a Hadoop cluster. -Steve (BTW: I'm not going to volunteer as a mentor/committer, my focus is on getting back into Hadoop core coding without distractions) - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Accumulo for the Apache Incubator
On Tue, Sep 6, 2011 at 8:09 AM, Steve Loughran ste...@apache.org wrote: 1300 lines: heavily modified versions of Hadoop BloomFilters -any plan to contribute back to hadoop-core, or are they too incompatible now? 419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo 325 lines: our Value is an immutable version of Hadoop BytesWritable -any plan to contribute back to hadoop-core? ... I understand why you've forked off your own versions of some of the Hadoop and HBase core -it is not only your right, it gets the changes in on your schedule. I have been known to do this myself. Without derailing this thread too much, just to put things in perspective: HBase has a fork of Hadoop's IPC. This makes up about 4000 lines of HBase's code. It's not a big deal. That's why we like the Apache license. Good engineers should always be evaluating the tradeoffs between staying with mainline and having to maintain a fork of a particular piece of code. Sometimes the latter makes sense, even within two closely-related projects. -Todd -- Todd Lipcon Software Engineer, Cloudera - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Accumulo for the Apache Incubator
Hey Steve, We would like to be able to contribute back where appropriate. We think that our BloomFilter improvements and some of our MapFile improvements are generally useful, and those should be pretty natural contributions back to Hadoop. Other modifications may not be so obviously generally useful, such as hard-coded optimizations for Accumulo. However, it is certainly our goal to reduce unnecessary code forks. The classloader project was a challenge, and it took us several attempts to get it right. It sure is cool now that it works. We still have a number of tickets on our todo list in this area, like more convenient distribution mechanisms for user-defined functions (i.e. Iterators or Coprocessors) across a Hadoop cluster. Thanks for the pointers to BigTop and MR-279. Those certainly look promising for better integration with the Apache brand. I'm looking forward to lots of great contributions from the community to the roadmap as Accumulo moves into incubation. Cheers, Adam - Original Message - From: Steve Loughran ste...@apache.org To: general@incubator.apache.org Sent: Tue, 06 Sep 2011 15:09:44 - Subject: Re: [PROPOSAL] Accumulo for the Apache Incubator On 04/09/11 17:39, Billie J Rinaldi wrote: Bernd, We would divide the derived code into two categories: that which we modified only slightly (for example to allow us to extend it) and that which we modified heavily. Now that we are able to interact openly, we hope to supply much of that back to the original projects. There is a detailed overview below. We identified these by searching for copyright in our code. The total count came to just over 14,000 lines. We use heavily as a qualitative assessment of how much we modified, but we could certainly come up with quantitative assessments. 5400 lines: slightly modified versions of Hadoop BCFile and related classes (our current file format extends BCFile) 4300 lines: heavily modified versions of MapFile and SequenceFile (no longer our default file format, but still included for backward compatibility) Internal compatibility or external? If internal only I'd keep that out of the public codebase. 2000 lines: heavily modified versions of HBase BlockCache and related files (Adam didn't count the tests when he said 1500 lines) +1 for more tests. 1300 lines: heavily modified versions of Hadoop BloomFilters -any plan to contribute back to hadoop-core, or are they too incompatible now? 419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo 325 lines: our Value is an immutable version of Hadoop BytesWritable -any plan to contribute back to hadoop-core? 142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader classloaders scare me. If we had an ASF-certified-classloader-hacker proposal where only approved people could write CLs for ASF code I'd be +1 for it, even though I'd fail the test myself. I understand why you've forked off your own versions of some of the Hadoop and HBase core -it is not only your right, it gets the changes in on your schedule. I have been known to do this myself. Ideally those thing have to get back to a (future) version of Hadoop, which people like Doug and Owen can help with. Having forked code in the ASF codebase is something to avoid. Again, I speak from experience. I think the proposal ought to consider how they fit in with BigTop too, so it can be part of the full apache hadoop stack deploy/test process. I also think that the roadmap for the system may want to think about MR-279 integration; would that architecture be a better way to run Accumulo code within a Hadoop cluster. -Steve (BTW: I'm not going to volunteer as a mentor/committer, my focus is on getting back into Hadoop core coding without distractions) - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Accumulo for the Apache Incubator
On Saturday, September 3, 2011, Adam P Fuchs adam.p.fu...@ugov.gov wrote: Hi Bernd, The latest stable release of Accumulo contains roughly 200,000 lines of code, of which about 85,000 are machine generated thrift code. Of the remaining code, about 15,000 lines are derived from other Apache projects, and about 1,500 of those are derived from HBase code. The code derived from HBase comprises a query caching layer (block cache, index cache, multi-level LRU logic, etc.). So, you are saying more than 10% of the non-generated code base (and you are not counting lib-style uses/JARs here, right?) is derived from other Apache code? That seems to be unusual. Just curious, could you elaborate a bit about why you did that amd what kind of code that is? Thank you. Bernd
Re: [PROPOSAL] Accumulo for the Apache Incubator
+1 on the proposal On Sun, Sep 4, 2011 at 9:41 AM, Bernd Fondermann bernd.fonderm...@googlemail.com wrote: On Saturday, September 3, 2011, Adam P Fuchs adam.p.fu...@ugov.gov wrote: Hi Bernd, The latest stable release of Accumulo contains roughly 200,000 lines of code, of which about 85,000 are machine generated thrift code. Of the remaining code, about 15,000 lines are derived from other Apache projects, and about 1,500 of those are derived from HBase code. The code derived from HBase comprises a query caching layer (block cache, index cache, multi-level LRU logic, etc.). So, you are saying more than 10% of the non-generated code base (and you are not counting lib-style uses/JARs here, right?) is derived from other Apache code? That seems to be unusual. Just curious, could you elaborate a bit about why you did that amd what kind of code that is? Thank you. Bernd -- Thanks - Mohammad Nour Life is like riding a bicycle. To keep your balance you must keep moving - Albert Einstein - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Accumulo for the Apache Incubator
On Sep 4, 2011 3:41 AM, Bernd Fondermann bernd.fonderm...@googlemail.com wrote: ... So, you are saying more than 10% of the non-generated code base (and you are not counting lib-style uses/JARs here, right?) is derived from other Apache code? That seems to be unusual. Just curious, could you elaborate a bit about why you did that amd what kind of code that is? Thank you. You make it sound like deriving from our code base is a bad thing, and should be justified. I don't get it. That is what we *want* people to do. What is your concern here? Cheers, -g
Re: [PROPOSAL] Accumulo for the Apache Incubator
Bernd, We would divide the derived code into two categories: that which we modified only slightly (for example to allow us to extend it) and that which we modified heavily. Now that we are able to interact openly, we hope to supply much of that back to the original projects. There is a detailed overview below. We identified these by searching for copyright in our code. The total count came to just over 14,000 lines. We use heavily as a qualitative assessment of how much we modified, but we could certainly come up with quantitative assessments. 5400 lines: slightly modified versions of Hadoop BCFile and related classes (our current file format extends BCFile) 4300 lines: heavily modified versions of MapFile and SequenceFile (no longer our default file format, but still included for backward compatibility) 2000 lines: heavily modified versions of HBase BlockCache and related files (Adam didn't count the tests when he said 1500 lines) 1300 lines: heavily modified versions of Hadoop BloomFilters 419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo 325 lines: our Value is an immutable version of Hadoop BytesWritable 142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader Billie - Original Message - From: Bernd Fondermann bernd.fonderm...@googlemail.com To: general@incubator.apache.org Sent: Sunday, September 4, 2011 3:41:09 AM Subject: Re: [PROPOSAL] Accumulo for the Apache Incubator On Saturday, September 3, 2011, Adam P Fuchs adam.p.fu...@ugov.gov wrote: Hi Bernd, The latest stable release of Accumulo contains roughly 200,000 lines of code, of which about 85,000 are machine generated thrift code. Of the remaining code, about 15,000 lines are derived from other Apache projects, and about 1,500 of those are derived from HBase code. The code derived from HBase comprises a query caching layer (block cache, index cache, multi-level LRU logic, etc.). So, you are saying more than 10% of the non-generated code base (and you are not counting lib-style uses/JARs here, right?) is derived from other Apache code? That seems to be unusual. Just curious, could you elaborate a bit about why you did that amd what kind of code that is? Thank you. Bernd - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Accumulo for the Apache Incubator
On Sun, Sep 4, 2011 at 18:16, Greg Stein gst...@gmail.com wrote: On Sep 4, 2011 3:41 AM, Bernd Fondermann bernd.fonderm...@googlemail.com wrote: ... So, you are saying more than 10% of the non-generated code base (and you are not counting lib-style uses/JARs here, right?) is derived from other Apache code? That seems to be unusual. Just curious, could you elaborate a bit about why you did that amd what kind of code that is? Thank you. You make it sound like deriving from our code base is a bad thing, and should be justified. I don't get it. That is what we *want* people to do. Of course, many do so. Especially in closed source projects we will never know about. What is your concern here? The concern would be when people would take code and re-incubate it at large scale, whatever that means. But Billies reply below is showing that they improved Hadoop code (like I hoped) and are willing to contribute back. (If the code grant is going through at all, it sounds like a little bit more complicated than usual.) Hadoop can only benefit from that. Also, I don't share the concerns discussed over at hbase-dev. How large the overlap between HBase and Accumulo really is can still be determined in Incubation. Whether or not they will become two different projects or one is something that would be decided later in Incubation. Bernd - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Accumulo for the Apache Incubator
Hi Owen, I believe the answer is yes regarding the code grant, and I am currently confirming that with our lawyers. The LGPL dependencies are not core to Accumulo, and we're working on substituting other packages. We would have no problem doing this before the initial commit if necessary. Cheers, Adam On Sep 2, 2011 11:36 AM, Owen Oapos;Malley omal...@apache.org wrote: Is the NSA going to file a code grant for the project? How deeply embedded are the LGPL dependencies? Are they optional components or mandatory? Thanks, Owen - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Accumulo for the Apache Incubator
On Friday, September 2, 2011, Billie J Rinaldi billie.j.rina...@ugov.gov wrote: Greetings, I would like to propose Accumulo to be an Apache Incubator project. Accumulo is a distributed key/value store that provides expressive cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. It is based on Google's BigTable design and runs over Apache Hadoop and Zookeeper. How is the project's relation to HBase? Especially, how much code - if any - in the Accumolo code base is directly taken from HBase? Thanks, Bernd Here is a link to the proposal in the Incubator wiki: http://wiki.apache.org/incubator/AccumuloProposal I've also pasted the initial contents below. Thanks, Billie Rinaldi = Accumulo Proposal = == Abstract == Accumulo is a distributed key/value store that provides expressive, cell-level access labels. == Proposal == Accumulo is a sorted, distributed key/value store based on Google's BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. == Background == Google published the design of BigTable in 2006. Several other open source projects have implemented aspects of this design including HBase, CloudStore, and Cassandra. Accumulo began its development in 2008. == Rationale == There is a need for a flexible, high performance distributed key/value store that provides expressive, fine-grained access labels. The communities we expect to be most interested in such a project are government, health care, and other industries where privacy is a concern. We have made much progress in developing this project over the past 3 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. == Current Status == === Meritocracy === We intend to strongly encourage the community to help with and contribute to the code. We will actively seek potential committers and help them become familiar with the codebase. === Community === A strong government community has developed around Accumulo and training classes have been ongoing for about a year. Hundreds of developers use Accumulo. === Core Developers === The developers are mainly employed by the National Security Agency, but we anticipate interest developing among other companies. === Alignment === Accumulo is built on top of Hadoop, Zookeeper, and Thrift. It builds with Maven. Due to the strong relationship with these Apache projects, the incubator is a good match for Accumulo. == Known Risks == === Orphaned Products === There is only a small risk of being orphaned. The community is committed to improving the codebase of the project due to its fulfilling needs not addressed by any other software. === Inexperience with Open Source === The codebase has been treated internally as an open source project since its beginning, and the initial Apache committers have been involved with the code for multiple years. While our experience with public open source is limited, we do not anticipate difficulty in operating under Apache's development process. === Homogeneous Developers === The committers have multiple employers and it is expected that committers from different companies will be recruited. === Reliance on Salaried Developers === The initial committers are all paid by their employers to work on Accumulo and we expect such employment to continue. Some of the initial committers would continue as volunteers even if no longer employed to do so. === Relationships with Other Apache Products === Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang, -net, -io, -jci, -collections, -configuration, -logging, and -codec. === Relationship to HBase === Accumulo and HBase are both based on the design of Google's BigTable, so there is a danger that potential users will have difficulty distinguishing the two or that they will not see an incentive in adopting Accumulo. There are a few key areas in which Accumulo differs from HBase. Some of the desired features of Accumulo could be incorporated into HBase, however the most important of these may be unlikely to be adopted (see cell-level access labels and iterators below). It is a possibility that the codebases will ultimately converge, but the number of differences at the current time warrants a separate project for Accumulo. Access Labels Accumulo has an additional portion of its key that sorts after the column qualifier and before the timestamp. It is called column visibility and enables expressive cell-level access control. Authorizations are passed with each query to control what data is returned to the user. The column
Re: [PROPOSAL] Accumulo for the Apache Incubator
Hi Bernd, The latest stable release of Accumulo contains roughly 200,000 lines of code, of which about 85,000 are machine generated thrift code. Of the remaining code, about 15,000 lines are derived from other Apache projects, and about 1,500 of those are derived from HBase code. The code derived from HBase comprises a query caching layer (block cache, index cache, multi-level LRU logic, etc.). More broadly, there are aspects of both systems that share common design elements, while many of the advanced features of the two systems are complementary. For example, the iterator framework in Accumulo and the coprocessor framework in HBase are distinct mechanisms for server-side execution of user-defined functions that can be used to encode different types of applications. The iterator framework provides a unique capability to encode functions (e.g. filtering and aggregation) within the compaction steps that happen in the background of the tablet server/region server, but they cannot be as easily used for inter-process communication as coprocessors without introducing the possibility of deadlock. In addition to the complementary features, many of the low-level designs of the two projects, while supporting similar functionality, differ in various dimensions of performance. Some examples of this are the way we implement column family partitioning/locality groups, our file selection algorithms for compactions, tablet/region metadata handling, RPC libraries, user-level security, testing suites (which could also be considered complementary), administrative tools, methods of dealing with the java garbage collector, server-side threading models, client code threading models, file compression, Key classes, and write-ahead logs. Going forward, both projects are going to be able to adapt complementary aspects of the other (we're already doing this with the query cache, and we are investigating adapting coprocessors from HBase). We look at having two systems that are so similar in core functionality but differ in implementation as a great opportunity for empirical exploration of the design space that will benefit both projects. I think that having both projects hosted in Apache gives us more incentive and opportunity to improve API compatibility between the two. If/when we find that the design space exploration has settled I expect that this will also be the best avenue towards merging the two projects if that becomes the desired goal. Cheers, Adam - Original Message - From: Bernd Fondermann bernd.fonderm...@googlemail.com To: general@incubator.apache.org Sent: Sat, 03 Sep 2011 11:17:10 - Subject: Re: [PROPOSAL] Accumulo for the Apache Incubator On Friday, September 2, 2011, Billie J Rinaldi billie.j.rina...@ugov.gov wrote: Greetings, I would like to propose Accumulo to be an Apache Incubator project. Accumulo is a distributed key/value store that provides expressive cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. It is based on Google's BigTable design and runs over Apache Hadoop and Zookeeper. How is the project's relation to HBase? Especially, how much code - if any - in the Accumolo code base is directly taken from HBase? Thanks, Bernd Here is a link to the proposal in the Incubator wiki: http://wiki.apache.org/incubator/AccumuloProposal I've also pasted the initial contents below. Thanks, Billie Rinaldi = Accumulo Proposal = == Abstract == Accumulo is a distributed key/value store that provides expressive, cell-level access labels. == Proposal == Accumulo is a sorted, distributed key/value store based on Google's BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. == Background == Google published the design of BigTable in 2006. Several other open source projects have implemented aspects of this design including HBase, CloudStore, and Cassandra. Accumulo began its development in 2008. == Rationale == There is a need for a flexible, high performance distributed key/value store that provides expressive, fine-grained access labels. The communities we expect to be most interested in such a project are government, health care, and other industries where privacy is a concern. We have made much progress in developing this project over the past 3 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. == Current Status == === Meritocracy === We intend to strongly encourage the community to help with and contribute to the code. We will actively seek potential committers and help them become familiar
Re: [PROPOSAL] Accumulo for the Apache Incubator
Is the NSA going to file a code grant for the project? How deeply embedded are the LGPL dependencies? Are they optional components or mandatory? Thanks, Owen - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Accumulo for the Apache Incubator
Non-binding +1. Regarding Owen's concern over licenses, if I recall correctly, those concerns would block graduation from the incubator, but not acceptance to it. I am also interested in being added as a committer to this proposal. As an HBase committer (but not speaking for the project as a whole) I think having cross-pollination between the codebases will be beneficial to everyone, so I'd like to be involved. Thanks -Todd On Fri, Sep 2, 2011 at 8:45 AM, Billie J Rinaldi billie.j.rina...@ugov.gov wrote: Greetings, I would like to propose Accumulo to be an Apache Incubator project. Accumulo is a distributed key/value store that provides expressive cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. It is based on Google's BigTable design and runs over Apache Hadoop and Zookeeper. Here is a link to the proposal in the Incubator wiki: http://wiki.apache.org/incubator/AccumuloProposal I've also pasted the initial contents below. Thanks, Billie Rinaldi = Accumulo Proposal = == Abstract == Accumulo is a distributed key/value store that provides expressive, cell-level access labels. == Proposal == Accumulo is a sorted, distributed key/value store based on Google's BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. == Background == Google published the design of BigTable in 2006. Several other open source projects have implemented aspects of this design including HBase, CloudStore, and Cassandra. Accumulo began its development in 2008. == Rationale == There is a need for a flexible, high performance distributed key/value store that provides expressive, fine-grained access labels. The communities we expect to be most interested in such a project are government, health care, and other industries where privacy is a concern. We have made much progress in developing this project over the past 3 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. == Current Status == === Meritocracy === We intend to strongly encourage the community to help with and contribute to the code. We will actively seek potential committers and help them become familiar with the codebase. === Community === A strong government community has developed around Accumulo and training classes have been ongoing for about a year. Hundreds of developers use Accumulo. === Core Developers === The developers are mainly employed by the National Security Agency, but we anticipate interest developing among other companies. === Alignment === Accumulo is built on top of Hadoop, Zookeeper, and Thrift. It builds with Maven. Due to the strong relationship with these Apache projects, the incubator is a good match for Accumulo. == Known Risks == === Orphaned Products === There is only a small risk of being orphaned. The community is committed to improving the codebase of the project due to its fulfilling needs not addressed by any other software. === Inexperience with Open Source === The codebase has been treated internally as an open source project since its beginning, and the initial Apache committers have been involved with the code for multiple years. While our experience with public open source is limited, we do not anticipate difficulty in operating under Apache's development process. === Homogeneous Developers === The committers have multiple employers and it is expected that committers from different companies will be recruited. === Reliance on Salaried Developers === The initial committers are all paid by their employers to work on Accumulo and we expect such employment to continue. Some of the initial committers would continue as volunteers even if no longer employed to do so. === Relationships with Other Apache Products === Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang, -net, -io, -jci, -collections, -configuration, -logging, and -codec. === Relationship to HBase === Accumulo and HBase are both based on the design of Google's BigTable, so there is a danger that potential users will have difficulty distinguishing the two or that they will not see an incentive in adopting Accumulo. There are a few key areas in which Accumulo differs from HBase. Some of the desired features of Accumulo could be incorporated into HBase, however the most important of these may be unlikely to be adopted (see cell-level access labels and iterators below). It is a possibility that the codebases will ultimately converge, but the number of differences at the current time
Re: [PROPOSAL] Accumulo for the Apache Incubator
No votes yet, please, except as an informal expression of (un)enthusiasm. Owen, you raise two question. On the subject of grants, please read the IP description in the proposal again. You can't 'grant' rights to something that neither you nor anyone else owns. The proposal offers both a preferred alternative and a backstop. On the subject of LGPL, I'll leave it to the authors to answer. On Fri, Sep 2, 2011 at 5:17 PM, Todd Lipcon t...@cloudera.com wrote: Non-binding +1. Regarding Owen's concern over licenses, if I recall correctly, those concerns would block graduation from the incubator, but not acceptance to it. I am also interested in being added as a committer to this proposal. As an HBase committer (but not speaking for the project as a whole) I think having cross-pollination between the codebases will be beneficial to everyone, so I'd like to be involved. Thanks -Todd On Fri, Sep 2, 2011 at 8:45 AM, Billie J Rinaldi billie.j.rina...@ugov.gov wrote: Greetings, I would like to propose Accumulo to be an Apache Incubator project. Accumulo is a distributed key/value store that provides expressive cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. It is based on Google's BigTable design and runs over Apache Hadoop and Zookeeper. Here is a link to the proposal in the Incubator wiki: http://wiki.apache.org/incubator/AccumuloProposal I've also pasted the initial contents below. Thanks, Billie Rinaldi = Accumulo Proposal = == Abstract == Accumulo is a distributed key/value store that provides expressive, cell-level access labels. == Proposal == Accumulo is a sorted, distributed key/value store based on Google's BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. == Background == Google published the design of BigTable in 2006. Several other open source projects have implemented aspects of this design including HBase, CloudStore, and Cassandra. Accumulo began its development in 2008. == Rationale == There is a need for a flexible, high performance distributed key/value store that provides expressive, fine-grained access labels. The communities we expect to be most interested in such a project are government, health care, and other industries where privacy is a concern. We have made much progress in developing this project over the past 3 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. == Current Status == === Meritocracy === We intend to strongly encourage the community to help with and contribute to the code. We will actively seek potential committers and help them become familiar with the codebase. === Community === A strong government community has developed around Accumulo and training classes have been ongoing for about a year. Hundreds of developers use Accumulo. === Core Developers === The developers are mainly employed by the National Security Agency, but we anticipate interest developing among other companies. === Alignment === Accumulo is built on top of Hadoop, Zookeeper, and Thrift. It builds with Maven. Due to the strong relationship with these Apache projects, the incubator is a good match for Accumulo. == Known Risks == === Orphaned Products === There is only a small risk of being orphaned. The community is committed to improving the codebase of the project due to its fulfilling needs not addressed by any other software. === Inexperience with Open Source === The codebase has been treated internally as an open source project since its beginning, and the initial Apache committers have been involved with the code for multiple years. While our experience with public open source is limited, we do not anticipate difficulty in operating under Apache's development process. === Homogeneous Developers === The committers have multiple employers and it is expected that committers from different companies will be recruited. === Reliance on Salaried Developers === The initial committers are all paid by their employers to work on Accumulo and we expect such employment to continue. Some of the initial committers would continue as volunteers even if no longer employed to do so. === Relationships with Other Apache Products === Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang, -net, -io, -jci, -collections, -configuration, -logging, and -codec. === Relationship to HBase === Accumulo and HBase are both based on the design of Google's BigTable, so there is a danger that potential users will have
Re: [PROPOSAL] Accumulo for the Apache Incubator
Owen, I believe the answer is yes regarding the code grant, and I am currently confirming that with our lawyers. We'll get you an official answer early next week. The LGPL dependencies are not core to Accumulo, and we're working on substituting other packages. We would have no problem doing this before the initial commit if necessary. Cheers, Adam - Original Message - From: Owen O'Malley omal...@apache.org To: general@incubator.apache.org Sent: Fri, 02 Sep 2011 18:36:11 - Subject: Re: [PROPOSAL] Accumulo for the Apache Incubator Is the NSA going to file a code grant for the project? How deeply embedded are the LGPL dependencies? Are they optional components or mandatory? Thanks, Owen - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Accumulo for the Apache Incubator
On Fri, Sep 2, 2011 at 3:22 PM, Adam P Fuchs adam.p.fu...@ugov.gov wrote: The project looks interesting. I believe the answer is yes regarding the code grant, and I am currently confirming that with our lawyers. We'll get you an official answer early next week. Great. I know that the US government has its own rules for such things. I took part in the meetings that created the NASA Open Source Agreement. (eg. the lawyers wouldn't let us call it an open source license...) Let us know how it goes. The LGPL dependencies are not core to Accumulo, and we're working on substituting other packages. We would have no problem doing this before the initial commit if necessary. I needs to be cleaned up before release, but the original commit is fine. -- Owen - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org