Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-06 Thread Steve Loughran

On 04/09/11 17:39, Billie J Rinaldi wrote:

Bernd,

We would divide the derived code into two categories: that which we modified only slightly (for 
example to allow us to extend it) and that which we modified heavily.  Now that we are able to 
interact openly, we hope to supply much of that back to the original projects.  There is a detailed 
overview below.  We identified these by searching for copyright in our code.  The total 
count came to just over 14,000 lines.  We use heavily as a qualitative assessment of 
how much we modified, but we could certainly come up with quantitative assessments.

5400 lines: slightly modified versions of Hadoop BCFile and related classes
 (our current file format extends BCFile)
4300 lines: heavily modified versions of MapFile and SequenceFile
 (no longer our default file format, but still included for 
backward compatibility)


Internal compatibility or external? If internal only I'd keep that out 
of the public codebase.



2000 lines: heavily modified versions of HBase BlockCache and related files
 (Adam didn't count the tests when he said 1500 lines)


+1 for more tests.


1300 lines: heavily modified versions of Hadoop BloomFilters


-any plan to contribute back to hadoop-core, or are they too 
incompatible now?




419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo
325 lines: our Value is an immutable version of Hadoop BytesWritable


-any plan to contribute back to hadoop-core?


142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader


classloaders scare me. If we had an ASF-certified-classloader-hacker 
proposal where only approved people could write CLs for ASF code I'd be 
+1 for it, even though I'd fail the test myself.


I understand why you've forked off your own versions of some of the 
Hadoop and HBase core -it is not only your right, it gets the changes in 
on your schedule. I have been known to do this myself.



Ideally those thing have to get back to a (future) version of Hadoop, 
which people like Doug and Owen can help with. Having forked code in the 
ASF codebase is something to avoid. Again, I speak from experience.


I think the proposal ought to consider how they fit in with BigTop too, 
so it can be part of the full apache hadoop stack deploy/test process.


I also think that the roadmap for the system may want to think about 
MR-279 integration; would that architecture be a better way to run 
Accumulo code within a Hadoop cluster.


-Steve

(BTW: I'm not going to volunteer as a mentor/committer, my focus is on 
getting back into Hadoop core coding without distractions)


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-06 Thread Todd Lipcon
On Tue, Sep 6, 2011 at 8:09 AM, Steve Loughran ste...@apache.org wrote:
 1300 lines: heavily modified versions of Hadoop BloomFilters

 -any plan to contribute back to hadoop-core, or are they too incompatible
 now?


 419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo
 325 lines: our Value is an immutable version of Hadoop BytesWritable

 -any plan to contribute back to hadoop-core?
...
 I understand why you've forked off your own versions of some of the Hadoop
 and HBase core -it is not only your right, it gets the changes in on your
 schedule. I have been known to do this myself.


Without derailing this thread too much, just to put things in
perspective: HBase has a fork of Hadoop's IPC. This makes up about
4000 lines of HBase's code. It's not a big deal. That's why we like
the Apache license. Good engineers should always be evaluating the
tradeoffs between staying with mainline and having to maintain a fork
of a particular piece of code. Sometimes the latter makes sense, even
within two closely-related projects.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-06 Thread Adam P Fuchs
Hey Steve,

We would like to be able to contribute back where appropriate. We think that 
our BloomFilter improvements and some of our MapFile improvements are generally 
useful, and those should be pretty natural contributions back to Hadoop. Other 
modifications may not be so obviously generally useful, such as hard-coded 
optimizations for Accumulo. However, it is certainly our goal to reduce 
unnecessary code forks.

The classloader project was a challenge, and it took us several attempts to get 
it right. It sure is cool now that it works. We still have a number of tickets 
on our todo list in this area, like more convenient distribution mechanisms for 
user-defined functions (i.e. Iterators or Coprocessors) across a Hadoop cluster.

Thanks for the pointers to BigTop and MR-279. Those certainly look promising 
for better integration with the Apache brand. I'm looking forward to lots of 
great contributions from the community to the roadmap as Accumulo moves into 
incubation.

Cheers,
Adam


- Original Message -
From: Steve Loughran ste...@apache.org
To: general@incubator.apache.org
Sent: Tue, 06 Sep 2011 15:09:44 -
Subject: Re: [PROPOSAL] Accumulo for the Apache Incubator

On 04/09/11 17:39, Billie J Rinaldi wrote:
 Bernd,

 We would divide the derived code into two categories: that which we modified 
 only slightly (for example to allow us to extend it) and that which we 
 modified heavily.  Now that we are able to interact openly, we hope to supply 
 much of that back to the original projects.  There is a detailed overview 
 below.  We identified these by searching for copyright in our code.  The 
 total count came to just over 14,000 lines.  We use heavily as a 
 qualitative assessment of how much we modified, but we could certainly come 
 up with quantitative assessments.

 5400 lines: slightly modified versions of Hadoop BCFile and related classes
  (our current file format extends BCFile)
 4300 lines: heavily modified versions of MapFile and SequenceFile
  (no longer our default file format, but still included for 
 backward compatibility)

Internal compatibility or external? If internal only I'd keep that out 
of the public codebase.

 2000 lines: heavily modified versions of HBase BlockCache and related files
  (Adam didn't count the tests when he said 1500 lines)

+1 for more tests.

 1300 lines: heavily modified versions of Hadoop BloomFilters

-any plan to contribute back to hadoop-core, or are they too 
incompatible now?


 419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo
 325 lines: our Value is an immutable version of Hadoop BytesWritable

-any plan to contribute back to hadoop-core?

 142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader

classloaders scare me. If we had an ASF-certified-classloader-hacker 
proposal where only approved people could write CLs for ASF code I'd be 
+1 for it, even though I'd fail the test myself.

I understand why you've forked off your own versions of some of the 
Hadoop and HBase core -it is not only your right, it gets the changes in 
on your schedule. I have been known to do this myself.


Ideally those thing have to get back to a (future) version of Hadoop, 
which people like Doug and Owen can help with. Having forked code in the 
ASF codebase is something to avoid. Again, I speak from experience.

I think the proposal ought to consider how they fit in with BigTop too, 
so it can be part of the full apache hadoop stack deploy/test process.

I also think that the roadmap for the system may want to think about 
MR-279 integration; would that architecture be a better way to run 
Accumulo code within a Hadoop cluster.

-Steve

(BTW: I'm not going to volunteer as a mentor/committer, my focus is on 
getting back into Hadoop core coding without distractions)

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-04 Thread Bernd Fondermann
On Saturday, September 3, 2011, Adam P Fuchs adam.p.fu...@ugov.gov wrote:
 Hi Bernd,

 The latest stable release of Accumulo contains roughly 200,000 lines of
code, of which about 85,000 are machine generated thrift code. Of the
remaining code, about 15,000 lines are derived from other Apache projects,
and about 1,500 of those are derived from HBase code. The code derived from
HBase comprises a query caching layer (block cache, index cache, multi-level
LRU logic, etc.).

So, you are saying more than 10% of the non-generated code base (and you are
not counting lib-style uses/JARs here, right?) is derived from other Apache
code? That seems to be unusual. Just curious, could you elaborate a bit
about why you did that amd what kind of code that is? Thank you.

 Bernd


Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-04 Thread Mohammad Nour El-Din
+1 on the proposal


On Sun, Sep 4, 2011 at 9:41 AM, Bernd Fondermann
bernd.fonderm...@googlemail.com wrote:
 On Saturday, September 3, 2011, Adam P Fuchs adam.p.fu...@ugov.gov wrote:
 Hi Bernd,

 The latest stable release of Accumulo contains roughly 200,000 lines of
 code, of which about 85,000 are machine generated thrift code. Of the
 remaining code, about 15,000 lines are derived from other Apache projects,
 and about 1,500 of those are derived from HBase code. The code derived from
 HBase comprises a query caching layer (block cache, index cache, multi-level
 LRU logic, etc.).

 So, you are saying more than 10% of the non-generated code base (and you are
 not counting lib-style uses/JARs here, right?) is derived from other Apache
 code? That seems to be unusual. Just curious, could you elaborate a bit
 about why you did that amd what kind of code that is? Thank you.

  Bernd




-- 
Thanks
- Mohammad Nour

Life is like riding a bicycle. To keep your balance you must keep moving
- Albert Einstein

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-04 Thread Greg Stein
On Sep 4, 2011 3:41 AM, Bernd Fondermann bernd.fonderm...@googlemail.com
wrote:
...

 So, you are saying more than 10% of the non-generated code base (and you
are
 not counting lib-style uses/JARs here, right?) is derived from other
Apache
 code? That seems to be unusual. Just curious, could you elaborate a bit
 about why you did that amd what kind of code that is? Thank you.

You make it sound like deriving from our code base is a bad thing, and
should be justified. I don't get it. That is what we *want* people to do.

What is your concern here?

Cheers,
-g


Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-04 Thread Billie J Rinaldi
Bernd,

We would divide the derived code into two categories: that which we modified 
only slightly (for example to allow us to extend it) and that which we modified 
heavily.  Now that we are able to interact openly, we hope to supply much of 
that back to the original projects.  There is a detailed overview below.  We 
identified these by searching for copyright in our code.  The total count 
came to just over 14,000 lines.  We use heavily as a qualitative assessment 
of how much we modified, but we could certainly come up with quantitative 
assessments.

5400 lines: slightly modified versions of Hadoop BCFile and related classes
(our current file format extends BCFile)
4300 lines: heavily modified versions of MapFile and SequenceFile
(no longer our default file format, but still included for backward 
compatibility)
2000 lines: heavily modified versions of HBase BlockCache and related files
(Adam didn't count the tests when he said 1500 lines)
1300 lines: heavily modified versions of Hadoop BloomFilters
419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo
325 lines: our Value is an immutable version of Hadoop BytesWritable
142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader

Billie


- Original Message -
From: Bernd Fondermann bernd.fonderm...@googlemail.com
To: general@incubator.apache.org
Sent: Sunday, September 4, 2011 3:41:09 AM
Subject: Re: [PROPOSAL] Accumulo for the Apache Incubator

On Saturday, September 3, 2011, Adam P Fuchs adam.p.fu...@ugov.gov wrote:
 Hi Bernd,

 The latest stable release of Accumulo contains roughly 200,000 lines of
code, of which about 85,000 are machine generated thrift code. Of the
remaining code, about 15,000 lines are derived from other Apache projects,
and about 1,500 of those are derived from HBase code. The code derived from
HBase comprises a query caching layer (block cache, index cache, multi-level
LRU logic, etc.).

So, you are saying more than 10% of the non-generated code base (and you are
not counting lib-style uses/JARs here, right?) is derived from other Apache
code? That seems to be unusual. Just curious, could you elaborate a bit
about why you did that amd what kind of code that is? Thank you.

 Bernd

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-04 Thread Bernd Fondermann
On Sun, Sep 4, 2011 at 18:16, Greg Stein gst...@gmail.com wrote:
 On Sep 4, 2011 3:41 AM, Bernd Fondermann bernd.fonderm...@googlemail.com
 wrote:
...

 So, you are saying more than 10% of the non-generated code base (and you
 are
 not counting lib-style uses/JARs here, right?) is derived from other
 Apache
 code? That seems to be unusual. Just curious, could you elaborate a bit
 about why you did that amd what kind of code that is? Thank you.

 You make it sound like deriving from our code base is a bad thing, and
 should be justified. I don't get it. That is what we *want* people to do.

Of course, many do so. Especially in closed source projects we will
never know about.


 What is your concern here?

The concern would be when people would take code and re-incubate it
at large scale, whatever that means.

But Billies reply below is showing that they improved Hadoop code
(like I hoped) and are willing to contribute back. (If the code grant
is going through at all, it sounds like a little bit more complicated
than usual.) Hadoop can only benefit from that.

Also, I don't share the concerns discussed over at hbase-dev. How
large the overlap between HBase and Accumulo really is can still be
determined in Incubation. Whether or not they will become two
different projects or one is something that would be decided later in
Incubation.

  Bernd

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-03 Thread Adam Fuchs
Hi Owen,

I believe the answer is yes regarding the code grant, and I am currently
confirming that with our lawyers.

The LGPL dependencies are not core to Accumulo, and we're working on
substituting other packages. We would have no problem doing this before the
initial commit if necessary.

Cheers,
Adam
On Sep 2, 2011 11:36 AM, Owen Oapos;Malley omal...@apache.org wrote:
 Is the NSA going to file a code grant for the project? How deeply
 embedded are the LGPL dependencies? Are they optional components or
 mandatory?

 Thanks,
 Owen

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-03 Thread Bernd Fondermann
On Friday, September 2, 2011, Billie J Rinaldi billie.j.rina...@ugov.gov
wrote:
 Greetings,

 I would like to propose Accumulo to be an Apache Incubator project.
 Accumulo is a distributed key/value store that provides expressive
cell-level access labels and a server-side programming mechanism that can
modify key/value pairs at various points in the data management process.  It
is based on Google's BigTable design and runs over Apache Hadoop and
Zookeeper.

How is the project's relation to HBase? Especially, how much code - if any -
in the Accumolo code base is directly taken from HBase?

Thanks,

 Bernd



 Here is a link to the proposal in the Incubator wiki:
 http://wiki.apache.org/incubator/AccumuloProposal

 I've also pasted the initial contents below.

 Thanks,
 Billie Rinaldi


 = Accumulo Proposal =

 == Abstract ==
 Accumulo is a distributed key/value store that provides expressive,
cell-level access labels.

 == Proposal ==
 Accumulo is a sorted, distributed key/value store based on Google's
BigTable design.  It is built on top of Apache Hadoop, Zookeeper, and
Thrift.  It features a few novel improvements on the BigTable design in the
form of cell-level access labels and a server-side programming mechanism
that can modify key/value pairs at various points in the data management
process.

 == Background ==
 Google published the design of BigTable in 2006.  Several other open
source projects have implemented aspects of this design including HBase,
CloudStore, and Cassandra.  Accumulo began its development in 2008.

 == Rationale ==
 There is a need for a flexible, high performance distributed key/value
store that provides expressive, fine-grained access labels.  The communities
we expect to be most interested in such a project are government, health
care, and other industries where privacy is a concern.  We have made much
progress in developing this project over the past 3 years and believe both
the project and the interested communities would benefit from this work
being openly available and having open development.

 == Current Status ==

 === Meritocracy ===
 We intend to strongly encourage the community to help with and contribute
to the code.  We will actively seek potential committers and help them
become familiar with the codebase.

 === Community ===
 A strong government community has developed around Accumulo and training
classes have been ongoing for about a year.  Hundreds of developers use
Accumulo.

 === Core Developers ===
 The developers are mainly employed by the National Security Agency, but we
anticipate interest developing among other companies.

 === Alignment ===
 Accumulo is built on top of Hadoop, Zookeeper, and Thrift.  It builds with
Maven.  Due to the strong relationship with these Apache projects, the
incubator is a good match for Accumulo.

 == Known Risks ==
 === Orphaned Products ===
 There is only a small risk of being orphaned.  The community is committed
to improving the codebase of the project due to its fulfilling needs not
addressed by any other software.

 === Inexperience with Open Source ===
 The codebase has been treated internally as an open source project since
its beginning, and the initial Apache committers have been involved with the
code for multiple years.  While our experience with public open source is
limited, we do not anticipate difficulty in operating under Apache's
development process.

 === Homogeneous Developers ===
 The committers have multiple employers and it is expected that committers
from different companies will be recruited.

 === Reliance on Salaried Developers ===
 The initial committers are all paid by their employers to work on Accumulo
and we expect such employment to continue.  Some of the initial committers
would continue as volunteers even if no longer employed to do so.

 === Relationships with Other Apache Products ===
 Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang, -net,
-io, -jci, -collections, -configuration, -logging, and -codec.

 === Relationship to HBase ===
 Accumulo and HBase are both based on the design of Google's BigTable, so
there is a danger that potential users will have difficulty distinguishing
the two or that they will not see an incentive in adopting Accumulo.  There
are a few key areas in which Accumulo differs from HBase.  Some of the
desired features of Accumulo could be incorporated into HBase, however the
most important of these may be unlikely to be adopted (see cell-level access
labels and iterators below).  It is a possibility that the codebases will
ultimately converge, but the number of differences at the current time
warrants a separate project for Accumulo.

  Access Labels 
 Accumulo has an additional portion of its key that sorts after the column
qualifier and before the timestamp.  It is called column visibility and
enables expressive cell-level access control.  Authorizations are passed
with each query to control what data is returned to the user.  The column

Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-03 Thread Adam P Fuchs
Hi Bernd,

The latest stable release of Accumulo contains roughly 200,000 lines of code, 
of which about 85,000 are machine generated thrift code. Of the remaining code, 
about 15,000 lines are derived from other Apache projects, and about 1,500 of 
those are derived from HBase code. The code derived from HBase comprises a 
query caching layer (block cache, index cache, multi-level LRU logic, etc.).

More broadly, there are aspects of both systems that share common design 
elements, while many of the advanced features of the two systems are 
complementary. For example, the iterator framework in Accumulo and the 
coprocessor framework in HBase are distinct mechanisms for server-side 
execution of user-defined functions that can be used to encode different types 
of applications. The iterator framework provides a unique capability to encode 
functions (e.g. filtering and aggregation) within the compaction steps that 
happen in the background of the tablet server/region server, but they cannot be 
as easily used for inter-process communication as coprocessors without 
introducing the possibility of deadlock.

In addition to the complementary features, many of the low-level designs of the 
two projects, while supporting similar functionality, differ in various 
dimensions of performance. Some examples of this are the way we implement 
column family partitioning/locality groups, our file selection algorithms for 
compactions, tablet/region metadata handling, RPC libraries, user-level 
security, testing suites (which could also be considered complementary), 
administrative tools, methods of dealing with the java garbage collector, 
server-side threading models, client code threading models, file compression, 
Key classes, and write-ahead logs.

Going forward, both projects are going to be able to adapt complementary 
aspects of the other (we're already doing this with the query cache, and we are 
investigating adapting coprocessors from HBase). We look at having two systems 
that are so similar in core functionality but differ in implementation as a 
great opportunity for empirical exploration of the design space that will 
benefit both projects. I think that having both projects hosted in Apache gives 
us more incentive and opportunity to improve API compatibility between the two. 
If/when we find that the design space exploration has settled I expect that 
this will also be the best avenue towards merging the two projects if that 
becomes the desired goal.

Cheers,
Adam



- Original Message -
From: Bernd Fondermann bernd.fonderm...@googlemail.com
To: general@incubator.apache.org
Sent: Sat, 03 Sep 2011 11:17:10 -
Subject: Re: [PROPOSAL] Accumulo for the Apache Incubator

On Friday, September 2, 2011, Billie J Rinaldi billie.j.rina...@ugov.gov
wrote:
 Greetings,

 I would like to propose Accumulo to be an Apache Incubator project.
 Accumulo is a distributed key/value store that provides expressive
cell-level access labels and a server-side programming mechanism that can
modify key/value pairs at various points in the data management process.  It
is based on Google's BigTable design and runs over Apache Hadoop and
Zookeeper.

How is the project's relation to HBase? Especially, how much code - if any -
in the Accumolo code base is directly taken from HBase?

Thanks,

 Bernd



 Here is a link to the proposal in the Incubator wiki:
 http://wiki.apache.org/incubator/AccumuloProposal

 I've also pasted the initial contents below.

 Thanks,
 Billie Rinaldi


 = Accumulo Proposal =

 == Abstract ==
 Accumulo is a distributed key/value store that provides expressive,
cell-level access labels.

 == Proposal ==
 Accumulo is a sorted, distributed key/value store based on Google's
BigTable design.  It is built on top of Apache Hadoop, Zookeeper, and
Thrift.  It features a few novel improvements on the BigTable design in the
form of cell-level access labels and a server-side programming mechanism
that can modify key/value pairs at various points in the data management
process.

 == Background ==
 Google published the design of BigTable in 2006.  Several other open
source projects have implemented aspects of this design including HBase,
CloudStore, and Cassandra.  Accumulo began its development in 2008.

 == Rationale ==
 There is a need for a flexible, high performance distributed key/value
store that provides expressive, fine-grained access labels.  The communities
we expect to be most interested in such a project are government, health
care, and other industries where privacy is a concern.  We have made much
progress in developing this project over the past 3 years and believe both
the project and the interested communities would benefit from this work
being openly available and having open development.

 == Current Status ==

 === Meritocracy ===
 We intend to strongly encourage the community to help with and contribute
to the code.  We will actively seek potential committers and help them
become familiar

Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-02 Thread Owen O'Malley
Is the NSA going to file a code grant for the project? How deeply
embedded are the LGPL dependencies? Are they optional components or
mandatory?

Thanks,
   Owen

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-02 Thread Todd Lipcon
Non-binding +1. Regarding Owen's concern over licenses, if I recall
correctly, those concerns would block graduation from the incubator,
but not acceptance to it.

I am also interested in being added as a committer to this proposal.
As an HBase committer (but not speaking for the project as a whole) I
think having cross-pollination between the codebases will be
beneficial to everyone, so I'd like to be involved.

Thanks
-Todd

On Fri, Sep 2, 2011 at 8:45 AM, Billie J Rinaldi
billie.j.rina...@ugov.gov wrote:
 Greetings,

 I would like to propose Accumulo to be an Apache Incubator project.  Accumulo 
 is a distributed key/value store that provides expressive cell-level access 
 labels and a server-side programming mechanism that can modify key/value 
 pairs at various points in the data management process.  It is based on 
 Google's BigTable design and runs over Apache Hadoop and Zookeeper.

 Here is a link to the proposal in the Incubator wiki:
 http://wiki.apache.org/incubator/AccumuloProposal

 I've also pasted the initial contents below.

 Thanks,
 Billie Rinaldi


 = Accumulo Proposal =

 == Abstract ==
 Accumulo is a distributed key/value store that provides expressive, 
 cell-level access labels.

 == Proposal ==
 Accumulo is a sorted, distributed key/value store based on Google's BigTable 
 design.  It is built on top of Apache Hadoop, Zookeeper, and Thrift.  It 
 features a few novel improvements on the BigTable design in the form of 
 cell-level access labels and a server-side programming mechanism that can 
 modify key/value pairs at various points in the data management process.

 == Background ==
 Google published the design of BigTable in 2006.  Several other open source 
 projects have implemented aspects of this design including HBase, CloudStore, 
 and Cassandra.  Accumulo began its development in 2008.

 == Rationale ==
 There is a need for a flexible, high performance distributed key/value store 
 that provides expressive, fine-grained access labels.  The communities we 
 expect to be most interested in such a project are government, health care, 
 and other industries where privacy is a concern.  We have made much progress 
 in developing this project over the past 3 years and believe both the project 
 and the interested communities would benefit from this work being openly 
 available and having open development.

 == Current Status ==

 === Meritocracy ===
 We intend to strongly encourage the community to help with and contribute to 
 the code.  We will actively seek potential committers and help them become 
 familiar with the codebase.

 === Community ===
 A strong government community has developed around Accumulo and training 
 classes have been ongoing for about a year.  Hundreds of developers use 
 Accumulo.

 === Core Developers ===
 The developers are mainly employed by the National Security Agency, but we 
 anticipate interest developing among other companies.

 === Alignment ===
 Accumulo is built on top of Hadoop, Zookeeper, and Thrift.  It builds with 
 Maven.  Due to the strong relationship with these Apache projects, the 
 incubator is a good match for Accumulo.

 == Known Risks ==
 === Orphaned Products ===
 There is only a small risk of being orphaned.  The community is committed to 
 improving the codebase of the project due to its fulfilling needs not 
 addressed by any other software.

 === Inexperience with Open Source ===
 The codebase has been treated internally as an open source project since its 
 beginning, and the initial Apache committers have been involved with the code 
 for multiple years.  While our experience with public open source is limited, 
 we do not anticipate difficulty in operating under Apache's development 
 process.

 === Homogeneous Developers ===
 The committers have multiple employers and it is expected that committers 
 from different companies will be recruited.

 === Reliance on Salaried Developers ===
 The initial committers are all paid by their employers to work on Accumulo 
 and we expect such employment to continue.  Some of the initial committers 
 would continue as volunteers even if no longer employed to do so.

 === Relationships with Other Apache Products ===
 Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang, -net, 
 -io, -jci, -collections, -configuration, -logging, and -codec.

 === Relationship to HBase ===
 Accumulo and HBase are both based on the design of Google's BigTable, so 
 there is a danger that potential users will have difficulty distinguishing 
 the two or that they will not see an incentive in adopting Accumulo.  There 
 are a few key areas in which Accumulo differs from HBase.  Some of the 
 desired features of Accumulo could be incorporated into HBase, however the 
 most important of these may be unlikely to be adopted (see cell-level access 
 labels and iterators below).  It is a possibility that the codebases will 
 ultimately converge, but the number of differences at the current time 
 

Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-02 Thread Benson Margulies
No votes yet, please, except as an informal expression of (un)enthusiasm.

Owen, you raise two question.

On the subject of grants, please read the IP description in the
proposal again. You can't 'grant' rights to something that neither you
nor anyone else owns. The proposal offers both a preferred alternative
and a backstop.

On the subject of LGPL, I'll leave it to the authors to answer.


On Fri, Sep 2, 2011 at 5:17 PM, Todd Lipcon t...@cloudera.com wrote:
 Non-binding +1. Regarding Owen's concern over licenses, if I recall
 correctly, those concerns would block graduation from the incubator,
 but not acceptance to it.

 I am also interested in being added as a committer to this proposal.
 As an HBase committer (but not speaking for the project as a whole) I
 think having cross-pollination between the codebases will be
 beneficial to everyone, so I'd like to be involved.

 Thanks
 -Todd

 On Fri, Sep 2, 2011 at 8:45 AM, Billie J Rinaldi
 billie.j.rina...@ugov.gov wrote:
 Greetings,

 I would like to propose Accumulo to be an Apache Incubator project.  
 Accumulo is a distributed key/value store that provides expressive 
 cell-level access labels and a server-side programming mechanism that can 
 modify key/value pairs at various points in the data management process.  It 
 is based on Google's BigTable design and runs over Apache Hadoop and 
 Zookeeper.

 Here is a link to the proposal in the Incubator wiki:
 http://wiki.apache.org/incubator/AccumuloProposal

 I've also pasted the initial contents below.

 Thanks,
 Billie Rinaldi


 = Accumulo Proposal =

 == Abstract ==
 Accumulo is a distributed key/value store that provides expressive, 
 cell-level access labels.

 == Proposal ==
 Accumulo is a sorted, distributed key/value store based on Google's BigTable 
 design.  It is built on top of Apache Hadoop, Zookeeper, and Thrift.  It 
 features a few novel improvements on the BigTable design in the form of 
 cell-level access labels and a server-side programming mechanism that can 
 modify key/value pairs at various points in the data management process.

 == Background ==
 Google published the design of BigTable in 2006.  Several other open source 
 projects have implemented aspects of this design including HBase, 
 CloudStore, and Cassandra.  Accumulo began its development in 2008.

 == Rationale ==
 There is a need for a flexible, high performance distributed key/value store 
 that provides expressive, fine-grained access labels.  The communities we 
 expect to be most interested in such a project are government, health care, 
 and other industries where privacy is a concern.  We have made much progress 
 in developing this project over the past 3 years and believe both the 
 project and the interested communities would benefit from this work being 
 openly available and having open development.

 == Current Status ==

 === Meritocracy ===
 We intend to strongly encourage the community to help with and contribute to 
 the code.  We will actively seek potential committers and help them become 
 familiar with the codebase.

 === Community ===
 A strong government community has developed around Accumulo and training 
 classes have been ongoing for about a year.  Hundreds of developers use 
 Accumulo.

 === Core Developers ===
 The developers are mainly employed by the National Security Agency, but we 
 anticipate interest developing among other companies.

 === Alignment ===
 Accumulo is built on top of Hadoop, Zookeeper, and Thrift.  It builds with 
 Maven.  Due to the strong relationship with these Apache projects, the 
 incubator is a good match for Accumulo.

 == Known Risks ==
 === Orphaned Products ===
 There is only a small risk of being orphaned.  The community is committed to 
 improving the codebase of the project due to its fulfilling needs not 
 addressed by any other software.

 === Inexperience with Open Source ===
 The codebase has been treated internally as an open source project since its 
 beginning, and the initial Apache committers have been involved with the 
 code for multiple years.  While our experience with public open source is 
 limited, we do not anticipate difficulty in operating under Apache's 
 development process.

 === Homogeneous Developers ===
 The committers have multiple employers and it is expected that committers 
 from different companies will be recruited.

 === Reliance on Salaried Developers ===
 The initial committers are all paid by their employers to work on Accumulo 
 and we expect such employment to continue.  Some of the initial committers 
 would continue as volunteers even if no longer employed to do so.

 === Relationships with Other Apache Products ===
 Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang, -net, 
 -io, -jci, -collections, -configuration, -logging, and -codec.

 === Relationship to HBase ===
 Accumulo and HBase are both based on the design of Google's BigTable, so 
 there is a danger that potential users will have 

Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-02 Thread Adam P Fuchs
Owen,

I believe the answer is yes regarding the code grant, and I am currently 
confirming that with our lawyers. We'll get you an official answer early next 
week.

The LGPL dependencies are not core to Accumulo, and we're working on 
substituting other packages. We would have no problem doing this before the 
initial commit if necessary. 

Cheers,
Adam

- Original Message -
From: Owen O'Malley omal...@apache.org
To: general@incubator.apache.org
Sent: Fri, 02 Sep 2011 18:36:11 -
Subject: Re: [PROPOSAL] Accumulo for the Apache Incubator

Is the NSA going to file a code grant for the project? How deeply
embedded are the LGPL dependencies? Are they optional components or
mandatory?

Thanks,
   Owen

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Accumulo for the Apache Incubator

2011-09-02 Thread Owen O'Malley
On Fri, Sep 2, 2011 at 3:22 PM, Adam P Fuchs adam.p.fu...@ugov.gov wrote:

The project looks interesting.

 I believe the answer is yes regarding the code grant, and I am currently 
 confirming that with our lawyers. We'll get you an official answer early next 
 week.

Great. I know that the US government has its own rules for such
things. I took part in the meetings that created the NASA Open Source
Agreement. (eg. the lawyers wouldn't let us call it an open source
license...) Let us know how it goes.

 The LGPL dependencies are not core to Accumulo, and we're working on 
 substituting other packages. We would have no problem doing this before the 
 initial commit if necessary.

I needs to be cleaned up before release, but the original commit is fine.

-- Owen

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org