date:20140514

Re: [PROPOSAL] Parquet

2014-05-14 Thread Mark Struberg

Sounds good to me.
I'm not into Hadoop, but sounds like it's useful.
The code seems to be ALv2 since quite some time thus I don't see much legal 
issues in this respect.


LieGrue,
strub

On Tuesday, 13 May 2014, 6:09, Chris Aniszczyk caniszc...@gmail.com wrote:
 
We would like to propose Parquet as an Apache Incubator project.
https://wiki.apache.org/incubator/ParquetProposal

Feel free to comment, we'll go for a vote in a week or two or whenever
consensus has been reached on the proposal.

I've posted posted the text of the proposal below:

== Abstract ==
Parquet is a columnar storage format for Hadoop.

== Proposal ==

We created Parquet to make the advantages of compressed, efficient columnar
data representation available to any project in the Hadoop ecosystem,
regardless of the choice of data processing framework, data model, or
programming language.

== Background ==

Parquet is built from the ground up with complex nested data structures in
mind, and uses the repetition/definition level approach to encoding such
data structures, as popularized by Google Dremel (
https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We believe
this approach is superior to simple flattening of nested name spaces.

Parquet is built to support very efficient compression and encoding
schemes. Parquet allows compression schemes to be specified on a per-column
level, and is future-proofed to allow adding more encodings as they are
invented and implemented. We separate the concepts of encoding and
compression, allowing parquet consumers to implement operators that work
directly on encoded data without paying decompression and decoding penalty
when possible.

== Rationale ==

Parquet is built to be used by anyone. We believe that an efficient,
well-implemented columnar storage substrate should be useful to all
frameworks without the cost of extensive and difficult to set up
dependencies.

Furthermore, the rapid growth of Parquet community is empowered by open
source. We believe the Apache foundation is a great fit as the long-term
home for Parquet, as it provides an established process for
community-driven development and decision making by consensus. This is
exactly the model we want for future Parquet development.

== Initial Goals ==

* Move the existing codebase to Apache
* Integrate with the Apache development process
* Ensure all dependencies are compliant with Apache License version 2.0
* Incremental development and releases per Apache guidelines

== Current Status ==

Parquet has undergone 2 major releases:
https://github.com/Parquet/parquet-format/releases of the core format and
22 releases: https://github.com/Parquet/parquet-mr/releases of the
supporting set of Java libraries.

The Parquet source is currently hosted at GitHub, which will seed the
Apache git repository.

=== Meritocracy ===

We plan to invest in supporting a meritocracy. We will discuss the
requirements in an open forum. Several companies have already expressed
interest in this project, and we intend to invite additional developers to
participate. We will encourage and monitor community participation so that
privileges can be extended to those that contribute.

=== Community ===

There is a large need for an advanced columnar storage format for Hadoop.
Parquet is being used in production by many organizations (see
https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md)

* Cloudera: https://twitter.com/HenryR/statuses/324222874011451392
* Criteo: https://twitter.com/julsimon/statuses/312114074911666177
* Salesforce: https://twitter.com/TwitterOSS/statuses/392734610116726784
* Stripe: https://twitter.com/avibryant/statuses/391339949250715648
* Twitter: https://twitter.com/J_/statuses/315844725611581441

By bringing Parquet into Apache, we believe that the community will grow
even bigger.

=== Core Developers ===

Parquet was initially developed as a collaboration between Twitter,
Cloudera and Criteo.

See
https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop

=== Alignment ===

We believe that having Parquet at Apache will help further the growth of
the big-data community, as it will encourage cooperation within the greater
ecosystem of projects spawned by Apache Hadoop. The alignment is also
beneficial to other Apache communities (such as Hadoop, Hive, Avro).

== Known Risks ==

=== Orphaned Products ===

The risk of the Parquet project being abandoned is minimal. There are many
organizations using Parquet in production, including Twitter, Cloudera,
Stripe, and Salesforce (
http://blog.cloudera.com/blog/2013/10/parquet-at-salesforce-com/).

=== Inexperience with Open Source ===

Parquet has existed as a healthy open source for one year. During that
time, we have curated an open-source community successfully, attracting
over 40 contributors (see
https://github.com/Parquet/parquet-mr/graphs/contributors) from a diverse
group of companies.
Several of the core contributors to the project are deeply familiar

[RESULT] [VOTE] Release Sentry incubating version 1.3.0-rc3

2014-05-14 Thread karthik ramachandran

Hi everyone,

I'm happy to announce that the vote for releasing Apache Sentry,
1.3.0-incubating rc3 has passed with 6 votes (3 binding).

Binding +1s: Partrick Hunt, Arvind Prabhakar, Justin Mclean

Non binding +1s:  Parsad Mujumdar, Srvya Tirukkovalur, Gregory Chanan

Thanks to everyone who participated! I'll continue with the rest of
the release process.


Karthik Ramachandran

-- 
Karthik Ramachandran
Mobile: 412-606-8981

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [PROPOSAL] Parquet

[RESULT] [VOTE] Release Sentry incubating version 1.3.0-rc3

2 matches

Site Navigation

Mail list logo

Footer information