Re: [PROPOSAL] Kylin for Incubation

2014-11-20 Thread Ted Dunning
Sounds good.

I have started the discussion to get Jacques on IPMC.



On Thu, Nov 20, 2014 at 9:27 AM, Luke Han luke...@gmail.com wrote:

 Hi all,
   Thank you for reviewing the proposal, with the discussion winding
 down we would like to send VOTE email next.

 Thanks
 Luke


 2014-11-15 11:40 GMT+08:00 Ted Dunning ted.dunn...@gmail.com:

 
  Also, a Chinese localized operating system is pretty clearly different
  from an olap engine.
 
  For comparison see the recent non-issue regarding Amazon aurora versus
  apache aurora.
 
  Sent from my iPhone
 
   On Nov 14, 2014, at 9:55, Henry Saputra henry.sapu...@gmail.com
 wrote:
  
   Thanks for the reminder Ross.
   Hopefully we could go in the similar route as Apache Spark, Apache
   Storm, and Apache MetaModel where the trademark should be used as
   'Apache Kylin'.
  
  
   - Henry
  
   On Fri, Nov 14, 2014 at 7:47 AM, Ross Gardler (MS OPEN TECH)
   ross.gard...@microsoft.com wrote:
   Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin
  
   Sent from my Windows Phone
   
   From: Luke Hanmailto:luke...@gmail.com
   Sent: ‎11/‎14/‎2014 7:38 AM
   To: general@incubator.apache.orgmailto:general@incubator.apache.org
   Subject: [PROPOSAL] Kylin for Incubation
  
   Hi all,
   We would like to propose Kylin as an Apache Incubator project. The
   complete proposal can be found:
   https://wiki.apache.org/incubator/KylinProposal and posted the text
 of
   the proposal below.
  
   Thanks.
   Luke
  
  
   Kylin Proposal
   ==
  
   # Abstract
  
   Kylin is a distributed and scalable OLAP engine built on Hadoop to
   support extremely large datasets.
  
   # Proposal
  
   Kylin is an open source Distributed Analytics Engine that provides
   multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
   accelerate analytics on Hadoop by allowing the use of SQL-compatible
   tools. Kylin provides a SQL interface and multi-dimensional analysis
   (MOLAP) on Hadoop to support extremely large datasets and tightly
   integrate with Hadoop ecosystem.
  
   ## Overview of Kylin
  
   Kylin platform has two parts of data processing and interactive:
   First, Kylin will read data from source, Hive, and run a set of tasks
   including Map Reduce job, shell script to pre-calcuate results for a
   specified data model, then save the resulting OLAP cube into storage
   such as HBase. Once these OLAP cubes are ready, a user can submit a
   request from any SQL-based tool or third party applications to Kylin’s
   REST server. The Server calls the Query Engine to determine if the
   target dataset already exists. If so, the engine directly accesses the
   target data in the form of a predefined cube, and returns the result
   with sub-second latency. Otherwise, the engine is designed to route
   non-matching queries to whichever SQL on Hadoop tool is already
   available on a Hadoop cluster, such as Hive.
  
   Kylin platform includes:
  
   - Metadata Manager: Kylin is a metadata-driven application. The Kylin
   Metadata Manager is the key component that manages all metadata stored
   in Kylin including all cube metadata. All other components rely on the
   Metadata Manager.
  
   - Job Engine: This engine is designed to handle all of the offline
   jobs including shell script, Java API, and Map Reduce jobs. The Job
   Engine manages and coordinates all of the jobs in Kylin to make sure
   each job executes and handles failures.
  
   - Storage Engine: This engine manages the underlying storage –
   specifically, the cuboids, which are stored as key-value pairs. The
   Storage Engine uses HBase – the best solution from the Hadoop
   ecosystem for leveraging an existing K-V system. Kylin can also be
   extended to support other K-V systems, such as Redis.
  
   - Query Engine: Once the cube is ready, the Query Engine can receive
   and parse user queries. It then interacts with other components to
   return the results to the user.
  
   - REST Server: The REST Server is an entry point for applications to
   develop against Kylin. Applications can submit queries, get results,
   trigger cube build jobs, get metadata, get user privileges, and so on.
  
   - ODBC Driver: To support third-party tools and applications – such as
   Tableau – we have built and open-sourced an ODBC Driver. The goal is
   to make it easy for users to onboard.
  
   # Background
  
   The challenge we face at eBay is that our data volume is becoming
   bigger and bigger while our user base is becoming more diverse. For
   e.g. our business users and analysts consistently ask for minimal
   latency when visualizing data on Tableau and Excel. So, we worked
   closely with our internal analyst community and outlined the product
   requirements for Kylin:
  
   - Sub-second query latency on billions of rows
   - ANSI SQL availability for those using SQL-compatible tools
   - Full OLAP capability to offer advanced functionality
   - 

Re: [PROPOSAL] NiFi for Incubation

2014-11-20 Thread Benson Margulies
Sean,

The precedent of Accumulo is that the govt people and agencies involved are
ready and able to have their staff collaborate openly in an Apache
community. There's no need to contemplate bifurcation; we have this
proposal because the management recognizes that this collaboration produces
better stuff that solves more problems than the 'inside the tent'
alternative.

--benson


On Thu, Nov 20, 2014 at 1:50 AM, Sean Busbey bus...@cloudera.com wrote:

 I'm really excited to see NiFi come to the incubator; it'd be a great
 addition to the ASF.

 A few points in the proposal:

  == Initial Goals ==

 One of these should be to grow the community outside of the current niche,
 IMHO.

 More on this below under orphaned projects

* Determine and establish a mechanism, possibly including a
  sub-project construct, that allows for extensions to the core
  application to occur at a pace that differs from the core application
  itself.

 I don't think the proposal needs to include the e.g. with sub-projects
 part. Just noting
 that your goals in the incubator are to address the need to have different
 release cycles
 for core and extensions is sufficient.


  === Community ===
  Over the past several years, NiFi has developed a strong community of
  both developers and operators within the U.S. government.  We look
  forward to helping grow this to a broader base of industries.
  

 How much, if any, of this community do you expect to engage via the
 customary project
 lists once NiFi is established within the ASF? Will the project be able to
 leverage this
 established group?


  === Orphaned Products ===
  Risk of orphaning is minimal.  The project user and developer base is
  substantial, growing, and there is already extensive operational use
  of NiFi.

 Given that the established base is internal to the U.S. government, I'd
 encourage the
 podling to consider the risk of a bifurcated project should a substantial
 outside
 community fail to emerge or if those internal users should fail to engage
 with the
 outside community.

 You cover a related issue in your Homogenous Developers section. But I
 think
 building on the Community section of the current state to call this out
 as an
 independent issue is worthwhile.


  possible.  This environment includes widely accessible source code
  repositories, published artifacts, ticket tracking, and extensive
  documentation. We also encourage contributions and frequent debate and
  hold regular, collaborative discussions through e-mail, chat rooms,
  and in-person meet-ups.

 Do you anticipate any difficulties moving these established communication
 mechanisms to ASF public lists?

  === Documentation ===
  At this time there is no NiFi documentation on the web.  However, we
  have extensive documentation included within the application that
  details usage of the many functions.  We will be rapidly expanding the
  available documentation to cover things like installation, developer
  guide, frequently asked questions, best practices, and more.  This
  documentation will be posted to the NiFi wiki at apache.org.

 I love projects that start with documentation. :)

 I don't think the proposal needs to include that the documentation will be
 posted
 to the NiFi wiki, since that's an implementation detail. Just say this
 documentation
 will be made available via the NiFi project's use of incubator infra.

 (I'll save detail for the eventual dev@ list, but you should strongly
 consider not
 using the wiki to host this documentation.)

 -Sean

 On Wed, Nov 19, 2014 at 11:27 PM, Brock Noland br...@cloudera.com wrote:

  Hi Joe,
 
  I know you've done a tremendous amount of work to make this happen so I
 am
  extremely happy this is *finally* making it's way to the incubator!
 
  I look forward to helping in anyway I can.
 
  Cheers!
  Brock
 
  On Wed, Nov 19, 2014 at 8:11 PM, Mattmann, Chris A (3980) 
  chris.a.mattm...@jpl.nasa.gov wrote:
 
   This is *fan freakin¹ tastic* Sounds like an awesome  project and
   glad to hear a relationship to Tika! Awesome to see more government
   projects coming into the ASF!
  
   you already have a great set of mentors and I don¹t really have more
   time on my plate, but really happy and will try and monitor and help
   on the lists.
  
   Cheers!
  
   Chris
  
   ++
   Chris Mattmann, Ph.D.
   Chief Architect
   Instrument Software and Science Data Systems Section (398)
   NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
   Office: 168-519, Mailstop: 168-527
   Email: chris.a.mattm...@nasa.gov
   WWW:  http://sunset.usc.edu/~mattmann/
   ++
   Adjunct Associate Professor, Computer Science Department
   University of Southern California, Los Angeles, CA 90089 USA
   ++
  
  
  
  
  
  
   -Original Message-
   From: Joe Witt 

Re: [PROPOSAL] NiFi for Incubation

2014-11-20 Thread Hadrian Zbarcea

Sounds exciting. I have a couple of questions:

1. Is there a code grant? I assume so, the proposal states that the 
project is active since 2006. What I could find [1] doesn't seem to be it.

2. What is the overlap with Apache Camel (if any)?

Cheers,
Hadrian


[1] https://github.com/Nifi


On 11/19/2014 09:02 PM, Joe Witt wrote:

Hello,


I would like to propose NiFi as an Apache Incubator Project.

In addition to the copy provided below the Wiki version of the
proposal can be found here:
http://wiki.apache.org/incubator/NiFiProposal

Thanks

Joe


= NiFi Proposal =

== Abstract ==
NiFi is a dataflow system based on the concepts of flow-based programming.

== Proposal ==
NiFi supports powerful and scalable directed graphs of data routing,
transformation, and system mediation logic.  Some of the high-level
capabilities and objectives of NiFi include:
   * Web-based user interface for seamless experience between design,
control, feedback, and monitoring of data flows
   * Highly configurable along several dimensions of quality of service
such as loss tolerant versus guaranteed delivery, low latency versus
high throughput, and priority based queuing
   * Fine-grained data provenance for all data received, forked,
joined, cloned, modified, sent, and ultimately dropped as data reaches
its configured end-state
   * Component-based extension model along well defined interfaces
enabling rapid development and effective testing

== Background ==
Reliable and effective dataflow between systems can be difficult
whether you're running scripts on a laptop or have a massive
distributed computing system operated by numerous teams and
organizations.  As the volume and rate of data grows and as the number
of systems, protocols, and formats increase and evolve so too does the
complexity and need for greater insight and agility.  These are the
dataflow challenges that NiFi was built to tackle.

NiFi is designed in a manner consistent with the core concepts
described in flow-based programming as originally documented by J.
Paul Morrison in the 1970s.  This model lends itself well to visual
diagramming, concurrency, componentization, testing, and reuse.  In
addition to staying close to the fundamentals of flow-based
programming, NiFi provides integration system specific features such
as: guaranteed delivery; back pressure; ability to gracefully handle
backlogs and data surges; and an operator interface that enables
on-the-fly data flow generation, modification, and observation.

== Rationale ==
NiFi provides a reliable, scalable, manageable and accountable
platform for developers and technical staff to create and evolve
powerful data flows.  Such a system is useful in many contexts
including large-scale enterprise integration, interaction with cloud
services and frameworks, business to business, intra-departmental, and
inter-departmental flows.  NiFi fits well within the Apache Software
Foundation (ASF) family as it depends on numerous ASF projects and
integrates with several others.  We also anticipate developing
extensions for several other ASF projects such as Cassandra, Kafka,
and Storm in the near future.

== Initial Goals ==
   * Ensure all dependencies are compliant with Apache License version
2.0 and all that all code and documentation artifacts have the correct
Apache licensing markings and notice.
   * Establish a formal release process and schedule, allowing for
dependable release cycles in a manner consistent with the Apache
development process.
   * Determine and establish a mechanism, possibly including a
sub-project construct, that allows for extensions to the core
application to occur at a pace that differs from the core application
itself.

== Current Status ==
=== Meritocracy ===
An integration platform is only as good as its ability to integrate
systems in a reliable, timely, and repeatable manner.  The same can be
said of its ability to attract talent and a variety of perspectives as
integration systems by their nature are always evolving.  We will
actively seek help and encourage promotion of influence in the project
through meritocracy.

=== Community ===
Over the past several years, NiFi has developed a strong community of
both developers and operators within the U.S. government.  We look
forward to helping grow this to a broader base of industries.

=== Core Developers ===
The initial core developers are employed by the National Security
Agency and defense contractors.  We will work to grow the community
among a more diverse set of developers and industries.

=== Alignment ===
 From its inception, NiFi was developed with an open source philosophy
in mind and with the hopes of eventually being truly open sourced.
The Apache way is consistent with the approach we have taken to date.
The ASF clearly provides a mature and effective environment for
successful development as is evident across the spectrum of well-known
projects.  Further, NiFi depends on numerous ASF libraries and
projects including; 

Re: [PROPOSAL] NiFi for Incubation

2014-11-20 Thread Joe Witt
Hello

Thank you for all the feedback thus far.

Sean, Jan I,

I've adjusted the proposal for the goals, community, and documentation.

Thanks
Joe

On Thu, Nov 20, 2014 at 1:50 AM, Sean Busbey bus...@cloudera.com wrote:

 I'm really excited to see NiFi come to the incubator; it'd be a great
 addition to the ASF.

 A few points in the proposal:

  == Initial Goals ==

 One of these should be to grow the community outside of the current niche,
 IMHO.

 More on this below under orphaned projects

* Determine and establish a mechanism, possibly including a
  sub-project construct, that allows for extensions to the core
  application to occur at a pace that differs from the core application
  itself.

 I don't think the proposal needs to include the e.g. with sub-projects
 part. Just noting
 that your goals in the incubator are to address the need to have different
 release cycles
 for core and extensions is sufficient.


  === Community ===
  Over the past several years, NiFi has developed a strong community of
  both developers and operators within the U.S. government.  We look
  forward to helping grow this to a broader base of industries.
  

 How much, if any, of this community do you expect to engage via the
 customary project
 lists once NiFi is established within the ASF? Will the project be able to
 leverage this
 established group?


  === Orphaned Products ===
  Risk of orphaning is minimal.  The project user and developer base is
  substantial, growing, and there is already extensive operational use
  of NiFi.

 Given that the established base is internal to the U.S. government, I'd
 encourage the
 podling to consider the risk of a bifurcated project should a substantial
 outside
 community fail to emerge or if those internal users should fail to engage
 with the
 outside community.

 You cover a related issue in your Homogenous Developers section. But I
 think
 building on the Community section of the current state to call this out
 as an
 independent issue is worthwhile.


  possible.  This environment includes widely accessible source code
  repositories, published artifacts, ticket tracking, and extensive
  documentation. We also encourage contributions and frequent debate and
  hold regular, collaborative discussions through e-mail, chat rooms,
  and in-person meet-ups.

 Do you anticipate any difficulties moving these established communication
 mechanisms to ASF public lists?

  === Documentation ===
  At this time there is no NiFi documentation on the web.  However, we
  have extensive documentation included within the application that
  details usage of the many functions.  We will be rapidly expanding the
  available documentation to cover things like installation, developer
  guide, frequently asked questions, best practices, and more.  This
  documentation will be posted to the NiFi wiki at apache.org.

 I love projects that start with documentation. :)

 I don't think the proposal needs to include that the documentation will be
 posted
 to the NiFi wiki, since that's an implementation detail. Just say this
 documentation
 will be made available via the NiFi project's use of incubator infra.

 (I'll save detail for the eventual dev@ list, but you should strongly
 consider not
 using the wiki to host this documentation.)

 -Sean

 On Wed, Nov 19, 2014 at 11:27 PM, Brock Noland br...@cloudera.com wrote:

  Hi Joe,
 
  I know you've done a tremendous amount of work to make this happen so I
 am
  extremely happy this is *finally* making it's way to the incubator!
 
  I look forward to helping in anyway I can.
 
  Cheers!
  Brock
 
  On Wed, Nov 19, 2014 at 8:11 PM, Mattmann, Chris A (3980) 
  chris.a.mattm...@jpl.nasa.gov wrote:
 
   This is *fan freakin¹ tastic* Sounds like an awesome  project and
   glad to hear a relationship to Tika! Awesome to see more government
   projects coming into the ASF!
  
   you already have a great set of mentors and I don¹t really have more
   time on my plate, but really happy and will try and monitor and help
   on the lists.
  
   Cheers!
  
   Chris
  
   ++
   Chris Mattmann, Ph.D.
   Chief Architect
   Instrument Software and Science Data Systems Section (398)
   NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
   Office: 168-519, Mailstop: 168-527
   Email: chris.a.mattm...@nasa.gov
   WWW:  http://sunset.usc.edu/~mattmann/
   ++
   Adjunct Associate Professor, Computer Science Department
   University of Southern California, Los Angeles, CA 90089 USA
   ++
  
  
  
  
  
  
   -Original Message-
   From: Joe Witt joe.w...@gmail.com
   Reply-To: general@incubator.apache.org general@incubator.apache.org
 
   Date: Thursday, November 20, 2014 at 3:02 AM
   To: general@incubator.apache.org general@incubator.apache.org
   Subject: [PROPOSAL] NiFi for 

Re: [PROPOSAL] NiFi for Incubation

2014-11-20 Thread Jim Jagielski
very, VERY cool!

 On Nov 19, 2014, at 9:02 PM, Joe Witt joe.w...@gmail.com wrote:
 
 Hello,
 
 
 I would like to propose NiFi as an Apache Incubator Project.
 
 In addition to the copy provided below the Wiki version of the
 proposal can be found here:
 http://wiki.apache.org/incubator/NiFiProposal
 
 Thanks
 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] NiFi for Incubation

2014-11-20 Thread Joe Witt
Hadrian

Yes there is a Software Grant Agreement.  NSA's tech transfer folks have
already sent that to Apache.

Given that we are coming from a closed source environment you won't find
much.  That is what this proposal is about though as we're working hard to
change that.

The github link you reference has no relationship to this project.

The relationship to Apache Camel will need to be explored further as NiFi
is often used in similar problem spaces (integration).  Camel is really
powerful in its core purpose and has an excellent community and a great
deal of maturity.  NiFi provides a complete dataflow application with a
major focus on the user experience, graphical creation and real-time
command and control of those flows.  It will be interesting as we progress
to see how we can best integrate with projects like Camel and I am looking
forward to hearing some of the thoughts and ideas the community comes up.

Thanks
Joe

On Thu, Nov 20, 2014 at 7:45 AM, Hadrian Zbarcea hzbar...@gmail.com wrote:

 Sounds exciting. I have a couple of questions:

 1. Is there a code grant? I assume so, the proposal states that the
 project is active since 2006. What I could find [1] doesn't seem to be it.
 2. What is the overlap with Apache Camel (if any)?

 Cheers,
 Hadrian


 [1] https://github.com/Nifi



 On 11/19/2014 09:02 PM, Joe Witt wrote:

 Hello,


 I would like to propose NiFi as an Apache Incubator Project.

 In addition to the copy provided below the Wiki version of the
 proposal can be found here:
 http://wiki.apache.org/incubator/NiFiProposal

 Thanks

 Joe


 = NiFi Proposal =

 == Abstract ==
 NiFi is a dataflow system based on the concepts of flow-based programming.

 == Proposal ==
 NiFi supports powerful and scalable directed graphs of data routing,
 transformation, and system mediation logic.  Some of the high-level
 capabilities and objectives of NiFi include:
* Web-based user interface for seamless experience between design,
 control, feedback, and monitoring of data flows
* Highly configurable along several dimensions of quality of service
 such as loss tolerant versus guaranteed delivery, low latency versus
 high throughput, and priority based queuing
* Fine-grained data provenance for all data received, forked,
 joined, cloned, modified, sent, and ultimately dropped as data reaches
 its configured end-state
* Component-based extension model along well defined interfaces
 enabling rapid development and effective testing

 == Background ==
 Reliable and effective dataflow between systems can be difficult
 whether you're running scripts on a laptop or have a massive
 distributed computing system operated by numerous teams and
 organizations.  As the volume and rate of data grows and as the number
 of systems, protocols, and formats increase and evolve so too does the
 complexity and need for greater insight and agility.  These are the
 dataflow challenges that NiFi was built to tackle.

 NiFi is designed in a manner consistent with the core concepts
 described in flow-based programming as originally documented by J.
 Paul Morrison in the 1970s.  This model lends itself well to visual
 diagramming, concurrency, componentization, testing, and reuse.  In
 addition to staying close to the fundamentals of flow-based
 programming, NiFi provides integration system specific features such
 as: guaranteed delivery; back pressure; ability to gracefully handle
 backlogs and data surges; and an operator interface that enables
 on-the-fly data flow generation, modification, and observation.

 == Rationale ==
 NiFi provides a reliable, scalable, manageable and accountable
 platform for developers and technical staff to create and evolve
 powerful data flows.  Such a system is useful in many contexts
 including large-scale enterprise integration, interaction with cloud
 services and frameworks, business to business, intra-departmental, and
 inter-departmental flows.  NiFi fits well within the Apache Software
 Foundation (ASF) family as it depends on numerous ASF projects and
 integrates with several others.  We also anticipate developing
 extensions for several other ASF projects such as Cassandra, Kafka,
 and Storm in the near future.

 == Initial Goals ==
* Ensure all dependencies are compliant with Apache License version
 2.0 and all that all code and documentation artifacts have the correct
 Apache licensing markings and notice.
* Establish a formal release process and schedule, allowing for
 dependable release cycles in a manner consistent with the Apache
 development process.
* Determine and establish a mechanism, possibly including a
 sub-project construct, that allows for extensions to the core
 application to occur at a pace that differs from the core application
 itself.

 == Current Status ==
 === Meritocracy ===
 An integration platform is only as good as its ability to integrate
 systems in a reliable, timely, and repeatable manner.  The same can be
 said of its ability to 

Re: [PROPOSAL] NiFi for Incubation

2014-11-20 Thread Tim Williams
+1, good stuff...

--tim

On Wed, Nov 19, 2014 at 9:02 PM, Joe Witt joe.w...@gmail.com wrote:
 Hello,


 I would like to propose NiFi as an Apache Incubator Project.

 In addition to the copy provided below the Wiki version of the
 proposal can be found here:
 http://wiki.apache.org/incubator/NiFiProposal

 Thanks

 Joe


 = NiFi Proposal =

 == Abstract ==
 NiFi is a dataflow system based on the concepts of flow-based programming.

 == Proposal ==
 NiFi supports powerful and scalable directed graphs of data routing,
 transformation, and system mediation logic.  Some of the high-level
 capabilities and objectives of NiFi include:
   * Web-based user interface for seamless experience between design,
 control, feedback, and monitoring of data flows
   * Highly configurable along several dimensions of quality of service
 such as loss tolerant versus guaranteed delivery, low latency versus
 high throughput, and priority based queuing
   * Fine-grained data provenance for all data received, forked,
 joined, cloned, modified, sent, and ultimately dropped as data reaches
 its configured end-state
   * Component-based extension model along well defined interfaces
 enabling rapid development and effective testing

 == Background ==
 Reliable and effective dataflow between systems can be difficult
 whether you're running scripts on a laptop or have a massive
 distributed computing system operated by numerous teams and
 organizations.  As the volume and rate of data grows and as the number
 of systems, protocols, and formats increase and evolve so too does the
 complexity and need for greater insight and agility.  These are the
 dataflow challenges that NiFi was built to tackle.

 NiFi is designed in a manner consistent with the core concepts
 described in flow-based programming as originally documented by J.
 Paul Morrison in the 1970s.  This model lends itself well to visual
 diagramming, concurrency, componentization, testing, and reuse.  In
 addition to staying close to the fundamentals of flow-based
 programming, NiFi provides integration system specific features such
 as: guaranteed delivery; back pressure; ability to gracefully handle
 backlogs and data surges; and an operator interface that enables
 on-the-fly data flow generation, modification, and observation.

 == Rationale ==
 NiFi provides a reliable, scalable, manageable and accountable
 platform for developers and technical staff to create and evolve
 powerful data flows.  Such a system is useful in many contexts
 including large-scale enterprise integration, interaction with cloud
 services and frameworks, business to business, intra-departmental, and
 inter-departmental flows.  NiFi fits well within the Apache Software
 Foundation (ASF) family as it depends on numerous ASF projects and
 integrates with several others.  We also anticipate developing
 extensions for several other ASF projects such as Cassandra, Kafka,
 and Storm in the near future.

 == Initial Goals ==
   * Ensure all dependencies are compliant with Apache License version
 2.0 and all that all code and documentation artifacts have the correct
 Apache licensing markings and notice.
   * Establish a formal release process and schedule, allowing for
 dependable release cycles in a manner consistent with the Apache
 development process.
   * Determine and establish a mechanism, possibly including a
 sub-project construct, that allows for extensions to the core
 application to occur at a pace that differs from the core application
 itself.

 == Current Status ==
 === Meritocracy ===
 An integration platform is only as good as its ability to integrate
 systems in a reliable, timely, and repeatable manner.  The same can be
 said of its ability to attract talent and a variety of perspectives as
 integration systems by their nature are always evolving.  We will
 actively seek help and encourage promotion of influence in the project
 through meritocracy.

 === Community ===
 Over the past several years, NiFi has developed a strong community of
 both developers and operators within the U.S. government.  We look
 forward to helping grow this to a broader base of industries.

 === Core Developers ===
 The initial core developers are employed by the National Security
 Agency and defense contractors.  We will work to grow the community
 among a more diverse set of developers and industries.

 === Alignment ===
 From its inception, NiFi was developed with an open source philosophy
 in mind and with the hopes of eventually being truly open sourced.
 The Apache way is consistent with the approach we have taken to date.
 The ASF clearly provides a mature and effective environment for
 successful development as is evident across the spectrum of well-known
 projects.  Further, NiFi depends on numerous ASF libraries and
 projects including; ActiveMQ, Ant, Commons, Lucene, Hadoop,
 HttpClient, Jakarta and Maven.  We also anticipate extensions and
 dependencies with several more ASF projects, including 

Re: [PROPOSAL] NiFi for Incubation

2014-11-20 Thread Josh Elser

Very exciting stuff!

Not presently on IPMC, but if you'd have me, I'd be happy to volunteer 
as a mentor. If so, I'll submit an application to join the IPMC and we 
can go from there.


- Josh

Joe Witt wrote:

Hello,


I would like to propose NiFi as an Apache Incubator Project.

In addition to the copy provided below the Wiki version of the
proposal can be found here:
http://wiki.apache.org/incubator/NiFiProposal

Thanks

Joe


= NiFi Proposal =

== Abstract ==
NiFi is a dataflow system based on the concepts of flow-based programming.

== Proposal ==
NiFi supports powerful and scalable directed graphs of data routing,
transformation, and system mediation logic.  Some of the high-level
capabilities and objectives of NiFi include:
   * Web-based user interface for seamless experience between design,
control, feedback, and monitoring of data flows
   * Highly configurable along several dimensions of quality of service
such as loss tolerant versus guaranteed delivery, low latency versus
high throughput, and priority based queuing
   * Fine-grained data provenance for all data received, forked,
joined, cloned, modified, sent, and ultimately dropped as data reaches
its configured end-state
   * Component-based extension model along well defined interfaces
enabling rapid development and effective testing

== Background ==
Reliable and effective dataflow between systems can be difficult
whether you're running scripts on a laptop or have a massive
distributed computing system operated by numerous teams and
organizations.  As the volume and rate of data grows and as the number
of systems, protocols, and formats increase and evolve so too does the
complexity and need for greater insight and agility.  These are the
dataflow challenges that NiFi was built to tackle.

NiFi is designed in a manner consistent with the core concepts
described in flow-based programming as originally documented by J.
Paul Morrison in the 1970s.  This model lends itself well to visual
diagramming, concurrency, componentization, testing, and reuse.  In
addition to staying close to the fundamentals of flow-based
programming, NiFi provides integration system specific features such
as: guaranteed delivery; back pressure; ability to gracefully handle
backlogs and data surges; and an operator interface that enables
on-the-fly data flow generation, modification, and observation.

== Rationale ==
NiFi provides a reliable, scalable, manageable and accountable
platform for developers and technical staff to create and evolve
powerful data flows.  Such a system is useful in many contexts
including large-scale enterprise integration, interaction with cloud
services and frameworks, business to business, intra-departmental, and
inter-departmental flows.  NiFi fits well within the Apache Software
Foundation (ASF) family as it depends on numerous ASF projects and
integrates with several others.  We also anticipate developing
extensions for several other ASF projects such as Cassandra, Kafka,
and Storm in the near future.

== Initial Goals ==
   * Ensure all dependencies are compliant with Apache License version
2.0 and all that all code and documentation artifacts have the correct
Apache licensing markings and notice.
   * Establish a formal release process and schedule, allowing for
dependable release cycles in a manner consistent with the Apache
development process.
   * Determine and establish a mechanism, possibly including a
sub-project construct, that allows for extensions to the core
application to occur at a pace that differs from the core application
itself.

== Current Status ==
=== Meritocracy ===
An integration platform is only as good as its ability to integrate
systems in a reliable, timely, and repeatable manner.  The same can be
said of its ability to attract talent and a variety of perspectives as
integration systems by their nature are always evolving.  We will
actively seek help and encourage promotion of influence in the project
through meritocracy.

=== Community ===
Over the past several years, NiFi has developed a strong community of
both developers and operators within the U.S. government.  We look
forward to helping grow this to a broader base of industries.

=== Core Developers ===
The initial core developers are employed by the National Security
Agency and defense contractors.  We will work to grow the community
among a more diverse set of developers and industries.

=== Alignment ===
 From its inception, NiFi was developed with an open source philosophy
in mind and with the hopes of eventually being truly open sourced.
The Apache way is consistent with the approach we have taken to date.
The ASF clearly provides a mature and effective environment for
successful development as is evident across the spectrum of well-known
projects.  Further, NiFi depends on numerous ASF libraries and
projects including; ActiveMQ, Ant, Commons, Lucene, Hadoop,
HttpClient, Jakarta and Maven.  We also anticipate extensions and
dependencies with 

Re: [VOTE] (new) Release Apache Metamodel incubating 4.3.0

2014-11-20 Thread Henry Saputra
+1 (binding)

On Wed, Nov 19, 2014 at 2:10 PM, Kasper Sørensen
kasper.soren...@humaninference.com wrote:
 Hi All,

 The previous vote on this subject was cancelled because of a misstep in the 
 artifact signing procedure. Now we're back with a properly signed release 
 (based on the same source code).

 Please vote on releasing the following candidate as Apache MetaModel version 
 4.3.0-incubating.

 The Git tag to be voted on is v4.3.0- incubating
 tag: 
 https://git-wip-us.apache.org/repos/asf?p=incubator-metamodel.git;a=tag;h=refs/tags/MetaModel-4.3.0-incubating
 commit: 
 https://git-wip-us.apache.org/repos/asf?p=incubator-metamodel.git;a=commit;h=eef82fb039e819b8841c55e393898260733a545b

 The source artifact to be voted on is:
 https://repository.apache.org/content/repositories/orgapachemetamodel-1004/org/apache/metamodel/MetaModel/4.3.0-incubating/MetaModel-4.3.0-incubating-source-release.zip

 Parent directory (including MD5, SHA1 hashes etc.) of the source is:
 https://repository.apache.org/content/repositories/orgapachemetamodel-1004/org/apache/metamodel/MetaModel/4.3.0-incubating

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/kaspersor.asc

 Release engineer public key id: 1FE1C2F5

 Vote thread link from d...@metamodel.incubator.apache.org mailing list:
 http://markmail.org/thread/cksfunp5oiihbag2

 Result thread link from d...@metamodel.incubator.apache.org mailing list:
 http://markmail.org/message/fc4adybhue6t2jay

 Please vote on releasing this package as Apache MetaModel 4.3.0- incubating.

 The vote is open for 72 hours, or until we get the needed number of votes (3 
 times +1).

 [ ] +1 Release this package as Apache MetaModel 4.3.0 -incubating
 [ ] -1 Do not release this package because ...

 More information about the MetaModel project can be found at 
 http://metamodel.incubator.apache.org/

 Thank you in advance for participating.

 Regards,
 Kasper Sørensen

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[VOTE] Accept Kylin into the Apache Incubator

2014-11-20 Thread Luke Han
Following the discussion earlier in the thread:

http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e

I would like to call a VOTE for accepting Kylin as a new incubator project.

The proposal is available at:
https://wiki.apache.org/incubator/KylinProposal

and posted the text of the proposal below also.

Vote is open until 24th November 2014, 23:59:00 UTC

[ ] +1 accept Kylin in the Incubator
[ ] ±0
[ ] -1 because...


Thanks
Luke


Kylin Proposal
==

# Abstract

Kylin is a distributed and scalable OLAP engine built on Hadoop to
support extremely large datasets.

# Proposal

Kylin is an open source Distributed Analytics Engine that provides
multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
accelerate analytics on Hadoop by allowing the use of SQL-compatible
tools. Kylin provides a SQL interface and multi-dimensional analysis
(MOLAP) on Hadoop to support extremely large datasets and tightly
integrate with Hadoop ecosystem.

## Overview of Kylin

Kylin platform has two parts of data processing and interactive:
First, Kylin will read data from source, Hive, and run a set of tasks
including Map Reduce job, shell script to pre-calcuate results for a
specified data model, then save the resulting OLAP cube into storage
such as HBase. Once these OLAP cubes are ready, a user can submit a
request from any SQL-based tool or third party applications to Kylin’s
REST server. The Server calls the Query Engine to determine if the
target dataset already exists. If so, the engine directly accesses the
target data in the form of a predefined cube, and returns the result
with sub-second latency. Otherwise, the engine is designed to route
non-matching queries to whichever SQL on Hadoop tool is already
available on a Hadoop cluster, such as Hive.

Kylin platform includes:

- Metadata Manager: Kylin is a metadata-driven application. The Kylin
Metadata Manager is the key component that manages all metadata stored
in Kylin including all cube metadata. All other components rely on the
Metadata Manager.

- Job Engine: This engine is designed to handle all of the offline
jobs including shell script, Java API, and Map Reduce jobs. The Job
Engine manages and coordinates all of the jobs in Kylin to make sure
each job executes and handles failures.

- Storage Engine: This engine manages the underlying storage –
specifically, the cuboids, which are stored as key-value pairs. The
Storage Engine uses HBase – the best solution from the Hadoop
ecosystem for leveraging an existing K-V system. Kylin can also be
extended to support other K-V systems, such as Redis.

- Query Engine: Once the cube is ready, the Query Engine can receive
and parse user queries. It then interacts with other components to
return the results to the user.

- REST Server: The REST Server is an entry point for applications to
develop against Kylin. Applications can submit queries, get results,
trigger cube build jobs, get metadata, get user privileges, and so on.

- ODBC Driver: To support third-party tools and applications – such as
Tableau – we have built and open-sourced an ODBC Driver. The goal is
to make it easy for users to onboard.

# Background

The challenge we face at eBay is that our data volume is becoming
bigger and bigger while our user base is becoming more diverse. For
e.g. our business users and analysts consistently ask for minimal
latency when visualizing data on Tableau and Excel. So, we worked
closely with our internal analyst community and outlined the product
requirements for Kylin:

- Sub-second query latency on billions of rows
- ANSI SQL availability for those using SQL-compatible tools
- Full OLAP capability to offer advanced functionality
- Support for high cardinality and very large dimensions
- High concurrency for thousands of users
- Distributed and scale-out architecture for analysis in the TB to PB size
range

Existing SQL-on-Hadoop solutions commonly need to perform partial or
full table or file scans to compute the results of queries. The cost
of these large data scans can make many queries very slow (more than a
minute). The core idea of MOLAP (multi-dimensional OLAP) is to
pre-compute data along dimensions of interest and store resulting
aggregates as a cube. MOLAP is much faster but is inflexible. We
realized that no existing product met our exact requirements
externally – especially in the open source Hadoop community. To meet
our emerging business needs, we built a platform from scratch to
support MOLAP for these business requirements and then to support more
others include ROLAP. With an excellent development team and several
pilot customers, we have been able to bring the Kylin platform into
production as well as open source it.

# Rationale

When data grows to petabyte scale, the process of pre-calculation of a
query takes a long time and costly and powerful hardware. However,
with the benefit of Hadoop’s 

Re: [PROPOSAL] NiFi for Incubation

2014-11-20 Thread jan i
On 20 November 2014 14:05, Joe Witt joe.w...@gmail.com wrote:

 Hadrian

 Yes there is a Software Grant Agreement.  NSA's tech transfer folks have
 already sent that to Apache.

 Given that we are coming from a closed source environment you won't find
 much.  That is what this proposal is about though as we're working hard to
 change that.

 The github link you reference has no relationship to this project.

 The relationship to Apache Camel will need to be explored further as NiFi
 is often used in similar problem spaces (integration).  Camel is really
 powerful in its core purpose and has an excellent community and a great
 deal of maturity.  NiFi provides a complete dataflow application with a
 major focus on the user experience, graphical creation and real-time
 command and control of those flows.  It will be interesting as we progress
 to see how we can best integrate with projects like Camel and I am looking
 forward to hearing some of the thoughts and ideas the community comes up.

Thanks for the explanation, but just to be sure, similar/overlapping
projects is not a problem per se, the only real concern is if 2 communities
can grow.

rgds
jan i.




 Thanks
 Joe

 On Thu, Nov 20, 2014 at 7:45 AM, Hadrian Zbarcea hzbar...@gmail.com
 wrote:

  Sounds exciting. I have a couple of questions:
 
  1. Is there a code grant? I assume so, the proposal states that the
  project is active since 2006. What I could find [1] doesn't seem to be
 it.
  2. What is the overlap with Apache Camel (if any)?
 
  Cheers,
  Hadrian
 
 
  [1] https://github.com/Nifi
 
 
 
  On 11/19/2014 09:02 PM, Joe Witt wrote:
 
  Hello,
 
 
  I would like to propose NiFi as an Apache Incubator Project.
 
  In addition to the copy provided below the Wiki version of the
  proposal can be found here:
  http://wiki.apache.org/incubator/NiFiProposal
 
  Thanks
 
  Joe
 
 
  = NiFi Proposal =
 
  == Abstract ==
  NiFi is a dataflow system based on the concepts of flow-based
 programming.
 
  == Proposal ==
  NiFi supports powerful and scalable directed graphs of data routing,
  transformation, and system mediation logic.  Some of the high-level
  capabilities and objectives of NiFi include:
 * Web-based user interface for seamless experience between design,
  control, feedback, and monitoring of data flows
 * Highly configurable along several dimensions of quality of service
  such as loss tolerant versus guaranteed delivery, low latency versus
  high throughput, and priority based queuing
 * Fine-grained data provenance for all data received, forked,
  joined, cloned, modified, sent, and ultimately dropped as data reaches
  its configured end-state
 * Component-based extension model along well defined interfaces
  enabling rapid development and effective testing
 
  == Background ==
  Reliable and effective dataflow between systems can be difficult
  whether you're running scripts on a laptop or have a massive
  distributed computing system operated by numerous teams and
  organizations.  As the volume and rate of data grows and as the number
  of systems, protocols, and formats increase and evolve so too does the
  complexity and need for greater insight and agility.  These are the
  dataflow challenges that NiFi was built to tackle.
 
  NiFi is designed in a manner consistent with the core concepts
  described in flow-based programming as originally documented by J.
  Paul Morrison in the 1970s.  This model lends itself well to visual
  diagramming, concurrency, componentization, testing, and reuse.  In
  addition to staying close to the fundamentals of flow-based
  programming, NiFi provides integration system specific features such
  as: guaranteed delivery; back pressure; ability to gracefully handle
  backlogs and data surges; and an operator interface that enables
  on-the-fly data flow generation, modification, and observation.
 
  == Rationale ==
  NiFi provides a reliable, scalable, manageable and accountable
  platform for developers and technical staff to create and evolve
  powerful data flows.  Such a system is useful in many contexts
  including large-scale enterprise integration, interaction with cloud
  services and frameworks, business to business, intra-departmental, and
  inter-departmental flows.  NiFi fits well within the Apache Software
  Foundation (ASF) family as it depends on numerous ASF projects and
  integrates with several others.  We also anticipate developing
  extensions for several other ASF projects such as Cassandra, Kafka,
  and Storm in the near future.
 
  == Initial Goals ==
 * Ensure all dependencies are compliant with Apache License version
  2.0 and all that all code and documentation artifacts have the correct
  Apache licensing markings and notice.
 * Establish a formal release process and schedule, allowing for
  dependable release cycles in a manner consistent with the Apache
  development process.
 * Determine and establish a mechanism, possibly including a
  

Infra for podling setup

2014-11-20 Thread John D. Ament
Hi,

Since I'm new at being a mentor, I was wondering how to handle slow infra
requests for podlings?

Ideally, I'd like to help out infra with the steps required, as I know some
of the members of the podling are anxious to get things going.  The infra
terms to get things running are a bit loose - e.g. hang out with them.
Unfortunately my work blocks IRC ports so it's a pain to keep connected
during the day.

John


Re: Infra for podling setup

2014-11-20 Thread Konstantin Boudnik
I've just recently dealt with during the incubation for Ignite and looks like
the following tactics work the best:
 - ping Infra on your JIRA tickets once in a while
 - ping them on IRC #asfinfra channel

But in general, be patient - the folks are clearly pretty busy.

Regards,
  Cos

On Thu, Nov 20, 2014 at 10:42PM, John D. Ament wrote:
 Hi,
 
 Since I'm new at being a mentor, I was wondering how to handle slow infra
 requests for podlings?
 
 Ideally, I'd like to help out infra with the steps required, as I know some
 of the members of the podling are anxious to get things going.  The infra
 terms to get things running are a bit loose - e.g. hang out with them.
 Unfortunately my work blocks IRC ports so it's a pain to keep connected
 during the day.
 
 John

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] NiFi for Incubation

2014-11-20 Thread Joe Witt
Josh,

Really appreciate it and have updated the proposal.

Thanks
Joe

On Thu, Nov 20, 2014 at 9:35 AM, Josh Elser els...@apache.org wrote:

 Very exciting stuff!

 Not presently on IPMC, but if you'd have me, I'd be happy to volunteer as
 a mentor. If so, I'll submit an application to join the IPMC and we can go
 from there.

 - Josh


 Joe Witt wrote:

 Hello,


 I would like to propose NiFi as an Apache Incubator Project.

 In addition to the copy provided below the Wiki version of the
 proposal can be found here:
 http://wiki.apache.org/incubator/NiFiProposal

 Thanks

 Joe


 = NiFi Proposal =

 == Abstract ==
 NiFi is a dataflow system based on the concepts of flow-based programming.

 == Proposal ==
 NiFi supports powerful and scalable directed graphs of data routing,
 transformation, and system mediation logic.  Some of the high-level
 capabilities and objectives of NiFi include:
* Web-based user interface for seamless experience between design,
 control, feedback, and monitoring of data flows
* Highly configurable along several dimensions of quality of service
 such as loss tolerant versus guaranteed delivery, low latency versus
 high throughput, and priority based queuing
* Fine-grained data provenance for all data received, forked,
 joined, cloned, modified, sent, and ultimately dropped as data reaches
 its configured end-state
* Component-based extension model along well defined interfaces
 enabling rapid development and effective testing

 == Background ==
 Reliable and effective dataflow between systems can be difficult
 whether you're running scripts on a laptop or have a massive
 distributed computing system operated by numerous teams and
 organizations.  As the volume and rate of data grows and as the number
 of systems, protocols, and formats increase and evolve so too does the
 complexity and need for greater insight and agility.  These are the
 dataflow challenges that NiFi was built to tackle.

 NiFi is designed in a manner consistent with the core concepts
 described in flow-based programming as originally documented by J.
 Paul Morrison in the 1970s.  This model lends itself well to visual
 diagramming, concurrency, componentization, testing, and reuse.  In
 addition to staying close to the fundamentals of flow-based
 programming, NiFi provides integration system specific features such
 as: guaranteed delivery; back pressure; ability to gracefully handle
 backlogs and data surges; and an operator interface that enables
 on-the-fly data flow generation, modification, and observation.

 == Rationale ==
 NiFi provides a reliable, scalable, manageable and accountable
 platform for developers and technical staff to create and evolve
 powerful data flows.  Such a system is useful in many contexts
 including large-scale enterprise integration, interaction with cloud
 services and frameworks, business to business, intra-departmental, and
 inter-departmental flows.  NiFi fits well within the Apache Software
 Foundation (ASF) family as it depends on numerous ASF projects and
 integrates with several others.  We also anticipate developing
 extensions for several other ASF projects such as Cassandra, Kafka,
 and Storm in the near future.

 == Initial Goals ==
* Ensure all dependencies are compliant with Apache License version
 2.0 and all that all code and documentation artifacts have the correct
 Apache licensing markings and notice.
* Establish a formal release process and schedule, allowing for
 dependable release cycles in a manner consistent with the Apache
 development process.
* Determine and establish a mechanism, possibly including a
 sub-project construct, that allows for extensions to the core
 application to occur at a pace that differs from the core application
 itself.

 == Current Status ==
 === Meritocracy ===
 An integration platform is only as good as its ability to integrate
 systems in a reliable, timely, and repeatable manner.  The same can be
 said of its ability to attract talent and a variety of perspectives as
 integration systems by their nature are always evolving.  We will
 actively seek help and encourage promotion of influence in the project
 through meritocracy.

 === Community ===
 Over the past several years, NiFi has developed a strong community of
 both developers and operators within the U.S. government.  We look
 forward to helping grow this to a broader base of industries.

 === Core Developers ===
 The initial core developers are employed by the National Security
 Agency and defense contractors.  We will work to grow the community
 among a more diverse set of developers and industries.

 === Alignment ===
  From its inception, NiFi was developed with an open source philosophy
 in mind and with the hopes of eventually being truly open sourced.
 The Apache way is consistent with the approach we have taken to date.
 The ASF clearly provides a mature and effective environment for
 successful development as is evident across the 

Re: Infra for podling setup

2014-11-20 Thread John D. Ament
Jake,

Thanks for looking.  I'll have to get onto hipchat, probably the webclient
will work fine for me.
On Thu Nov 20 2014 at 9:37:13 PM Jake Farrell jfarr...@apache.org wrote:

 Hi John, what is the infra ticket you are having an issue with? We also
 moved away from using irc to hipchat [1] for infra communication

 -Jake


 [1]: http://www.hipchat.com/gdAiIcNyE

 On Thu, Nov 20, 2014 at 5:42 PM, John D. Ament john.d.am...@gmail.com
 wrote:

  Hi,
 
  Since I'm new at being a mentor, I was wondering how to handle slow infra
  requests for podlings?
 
  Ideally, I'd like to help out infra with the steps required, as I know
 some
  of the members of the podling are anxious to get things going.  The infra
  terms to get things running are a bit loose - e.g. hang out with them.
  Unfortunately my work blocks IRC ports so it's a pain to keep connected
  during the day.
 
  John
 



Re: [VOTE] Accept Kylin into the Apache Incubator

2014-11-20 Thread Ted Dunning
+1 (binding)



On Fri, Nov 21, 2014 at 3:37 AM, Andrew Purtell apurt...@apache.org wrote:

 +1 (binding)

 On Thu, Nov 20, 2014 at 2:31 PM, Luke Han luke...@gmail.com wrote:

  Following the discussion earlier in the thread:
 
 
 
 http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e
 
  I would like to call a VOTE for accepting Kylin as a new incubator
 project.
 
  The proposal is available at:
  https://wiki.apache.org/incubator/KylinProposal
 
  and posted the text of the proposal below also.
 
  Vote is open until 24th November 2014, 23:59:00 UTC
 
  [ ] +1 accept Kylin in the Incubator
  [ ] ±0
  [ ] -1 because...
 
 
  Thanks
  Luke
 
 
  Kylin Proposal
  ==
 
  # Abstract
 
  Kylin is a distributed and scalable OLAP engine built on Hadoop to
  support extremely large datasets.
 
  # Proposal
 
  Kylin is an open source Distributed Analytics Engine that provides
  multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
  accelerate analytics on Hadoop by allowing the use of SQL-compatible
  tools. Kylin provides a SQL interface and multi-dimensional analysis
  (MOLAP) on Hadoop to support extremely large datasets and tightly
  integrate with Hadoop ecosystem.
 
  ## Overview of Kylin
 
  Kylin platform has two parts of data processing and interactive:
  First, Kylin will read data from source, Hive, and run a set of tasks
  including Map Reduce job, shell script to pre-calcuate results for a
  specified data model, then save the resulting OLAP cube into storage
  such as HBase. Once these OLAP cubes are ready, a user can submit a
  request from any SQL-based tool or third party applications to Kylin’s
  REST server. The Server calls the Query Engine to determine if the
  target dataset already exists. If so, the engine directly accesses the
  target data in the form of a predefined cube, and returns the result
  with sub-second latency. Otherwise, the engine is designed to route
  non-matching queries to whichever SQL on Hadoop tool is already
  available on a Hadoop cluster, such as Hive.
 
  Kylin platform includes:
 
  - Metadata Manager: Kylin is a metadata-driven application. The Kylin
  Metadata Manager is the key component that manages all metadata stored
  in Kylin including all cube metadata. All other components rely on the
  Metadata Manager.
 
  - Job Engine: This engine is designed to handle all of the offline
  jobs including shell script, Java API, and Map Reduce jobs. The Job
  Engine manages and coordinates all of the jobs in Kylin to make sure
  each job executes and handles failures.
 
  - Storage Engine: This engine manages the underlying storage –
  specifically, the cuboids, which are stored as key-value pairs. The
  Storage Engine uses HBase – the best solution from the Hadoop
  ecosystem for leveraging an existing K-V system. Kylin can also be
  extended to support other K-V systems, such as Redis.
 
  - Query Engine: Once the cube is ready, the Query Engine can receive
  and parse user queries. It then interacts with other components to
  return the results to the user.
 
  - REST Server: The REST Server is an entry point for applications to
  develop against Kylin. Applications can submit queries, get results,
  trigger cube build jobs, get metadata, get user privileges, and so on.
 
  - ODBC Driver: To support third-party tools and applications – such as
  Tableau – we have built and open-sourced an ODBC Driver. The goal is
  to make it easy for users to onboard.
 
  # Background
 
  The challenge we face at eBay is that our data volume is becoming
  bigger and bigger while our user base is becoming more diverse. For
  e.g. our business users and analysts consistently ask for minimal
  latency when visualizing data on Tableau and Excel. So, we worked
  closely with our internal analyst community and outlined the product
  requirements for Kylin:
 
  - Sub-second query latency on billions of rows
  - ANSI SQL availability for those using SQL-compatible tools
  - Full OLAP capability to offer advanced functionality
  - Support for high cardinality and very large dimensions
  - High concurrency for thousands of users
  - Distributed and scale-out architecture for analysis in the TB to PB
 size
  range
 
  Existing SQL-on-Hadoop solutions commonly need to perform partial or
  full table or file scans to compute the results of queries. The cost
  of these large data scans can make many queries very slow (more than a
  minute). The core idea of MOLAP (multi-dimensional OLAP) is to
  pre-compute data along dimensions of interest and store resulting
  aggregates as a cube. MOLAP is much faster but is inflexible. We
  realized that no existing product met our exact requirements
  externally – especially in the open source Hadoop community. To meet
  our emerging business needs, we built a platform from scratch to
  support MOLAP for these business requirements and