Re: [PROPOSAL] Kylin for Incubation
Sounds good. I have started the discussion to get Jacques on IPMC. On Thu, Nov 20, 2014 at 9:27 AM, Luke Han luke...@gmail.com wrote: Hi all, Thank you for reviewing the proposal, with the discussion winding down we would like to send VOTE email next. Thanks Luke 2014-11-15 11:40 GMT+08:00 Ted Dunning ted.dunn...@gmail.com: Also, a Chinese localized operating system is pretty clearly different from an olap engine. For comparison see the recent non-issue regarding Amazon aurora versus apache aurora. Sent from my iPhone On Nov 14, 2014, at 9:55, Henry Saputra henry.sapu...@gmail.com wrote: Thanks for the reminder Ross. Hopefully we could go in the similar route as Apache Spark, Apache Storm, and Apache MetaModel where the trademark should be used as 'Apache Kylin'. - Henry On Fri, Nov 14, 2014 at 7:47 AM, Ross Gardler (MS OPEN TECH) ross.gard...@microsoft.com wrote: Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin Sent from my Windows Phone From: Luke Hanmailto:luke...@gmail.com Sent: 11/14/2014 7:38 AM To: general@incubator.apache.orgmailto:general@incubator.apache.org Subject: [PROPOSAL] Kylin for Incubation Hi all, We would like to propose Kylin as an Apache Incubator project. The complete proposal can be found: https://wiki.apache.org/incubator/KylinProposal and posted the text of the proposal below. Thanks. Luke Kylin Proposal == # Abstract Kylin is a distributed and scalable OLAP engine built on Hadoop to support extremely large datasets. # Proposal Kylin is an open source Distributed Analytics Engine that provides multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to accelerate analytics on Hadoop by allowing the use of SQL-compatible tools. Kylin provides a SQL interface and multi-dimensional analysis (MOLAP) on Hadoop to support extremely large datasets and tightly integrate with Hadoop ecosystem. ## Overview of Kylin Kylin platform has two parts of data processing and interactive: First, Kylin will read data from source, Hive, and run a set of tasks including Map Reduce job, shell script to pre-calcuate results for a specified data model, then save the resulting OLAP cube into storage such as HBase. Once these OLAP cubes are ready, a user can submit a request from any SQL-based tool or third party applications to Kylin’s REST server. The Server calls the Query Engine to determine if the target dataset already exists. If so, the engine directly accesses the target data in the form of a predefined cube, and returns the result with sub-second latency. Otherwise, the engine is designed to route non-matching queries to whichever SQL on Hadoop tool is already available on a Hadoop cluster, such as Hive. Kylin platform includes: - Metadata Manager: Kylin is a metadata-driven application. The Kylin Metadata Manager is the key component that manages all metadata stored in Kylin including all cube metadata. All other components rely on the Metadata Manager. - Job Engine: This engine is designed to handle all of the offline jobs including shell script, Java API, and Map Reduce jobs. The Job Engine manages and coordinates all of the jobs in Kylin to make sure each job executes and handles failures. - Storage Engine: This engine manages the underlying storage – specifically, the cuboids, which are stored as key-value pairs. The Storage Engine uses HBase – the best solution from the Hadoop ecosystem for leveraging an existing K-V system. Kylin can also be extended to support other K-V systems, such as Redis. - Query Engine: Once the cube is ready, the Query Engine can receive and parse user queries. It then interacts with other components to return the results to the user. - REST Server: The REST Server is an entry point for applications to develop against Kylin. Applications can submit queries, get results, trigger cube build jobs, get metadata, get user privileges, and so on. - ODBC Driver: To support third-party tools and applications – such as Tableau – we have built and open-sourced an ODBC Driver. The goal is to make it easy for users to onboard. # Background The challenge we face at eBay is that our data volume is becoming bigger and bigger while our user base is becoming more diverse. For e.g. our business users and analysts consistently ask for minimal latency when visualizing data on Tableau and Excel. So, we worked closely with our internal analyst community and outlined the product requirements for Kylin: - Sub-second query latency on billions of rows - ANSI SQL availability for those using SQL-compatible tools - Full OLAP capability to offer advanced functionality -
Re: [PROPOSAL] NiFi for Incubation
Sean, The precedent of Accumulo is that the govt people and agencies involved are ready and able to have their staff collaborate openly in an Apache community. There's no need to contemplate bifurcation; we have this proposal because the management recognizes that this collaboration produces better stuff that solves more problems than the 'inside the tent' alternative. --benson On Thu, Nov 20, 2014 at 1:50 AM, Sean Busbey bus...@cloudera.com wrote: I'm really excited to see NiFi come to the incubator; it'd be a great addition to the ASF. A few points in the proposal: == Initial Goals == One of these should be to grow the community outside of the current niche, IMHO. More on this below under orphaned projects * Determine and establish a mechanism, possibly including a sub-project construct, that allows for extensions to the core application to occur at a pace that differs from the core application itself. I don't think the proposal needs to include the e.g. with sub-projects part. Just noting that your goals in the incubator are to address the need to have different release cycles for core and extensions is sufficient. === Community === Over the past several years, NiFi has developed a strong community of both developers and operators within the U.S. government. We look forward to helping grow this to a broader base of industries. How much, if any, of this community do you expect to engage via the customary project lists once NiFi is established within the ASF? Will the project be able to leverage this established group? === Orphaned Products === Risk of orphaning is minimal. The project user and developer base is substantial, growing, and there is already extensive operational use of NiFi. Given that the established base is internal to the U.S. government, I'd encourage the podling to consider the risk of a bifurcated project should a substantial outside community fail to emerge or if those internal users should fail to engage with the outside community. You cover a related issue in your Homogenous Developers section. But I think building on the Community section of the current state to call this out as an independent issue is worthwhile. possible. This environment includes widely accessible source code repositories, published artifacts, ticket tracking, and extensive documentation. We also encourage contributions and frequent debate and hold regular, collaborative discussions through e-mail, chat rooms, and in-person meet-ups. Do you anticipate any difficulties moving these established communication mechanisms to ASF public lists? === Documentation === At this time there is no NiFi documentation on the web. However, we have extensive documentation included within the application that details usage of the many functions. We will be rapidly expanding the available documentation to cover things like installation, developer guide, frequently asked questions, best practices, and more. This documentation will be posted to the NiFi wiki at apache.org. I love projects that start with documentation. :) I don't think the proposal needs to include that the documentation will be posted to the NiFi wiki, since that's an implementation detail. Just say this documentation will be made available via the NiFi project's use of incubator infra. (I'll save detail for the eventual dev@ list, but you should strongly consider not using the wiki to host this documentation.) -Sean On Wed, Nov 19, 2014 at 11:27 PM, Brock Noland br...@cloudera.com wrote: Hi Joe, I know you've done a tremendous amount of work to make this happen so I am extremely happy this is *finally* making it's way to the incubator! I look forward to helping in anyway I can. Cheers! Brock On Wed, Nov 19, 2014 at 8:11 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: This is *fan freakin¹ tastic* Sounds like an awesome project and glad to hear a relationship to Tika! Awesome to see more government projects coming into the ASF! you already have a great set of mentors and I don¹t really have more time on my plate, but really happy and will try and monitor and help on the lists. Cheers! Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Joe Witt
Re: [PROPOSAL] NiFi for Incubation
Sounds exciting. I have a couple of questions: 1. Is there a code grant? I assume so, the proposal states that the project is active since 2006. What I could find [1] doesn't seem to be it. 2. What is the overlap with Apache Camel (if any)? Cheers, Hadrian [1] https://github.com/Nifi On 11/19/2014 09:02 PM, Joe Witt wrote: Hello, I would like to propose NiFi as an Apache Incubator Project. In addition to the copy provided below the Wiki version of the proposal can be found here: http://wiki.apache.org/incubator/NiFiProposal Thanks Joe = NiFi Proposal = == Abstract == NiFi is a dataflow system based on the concepts of flow-based programming. == Proposal == NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Some of the high-level capabilities and objectives of NiFi include: * Web-based user interface for seamless experience between design, control, feedback, and monitoring of data flows * Highly configurable along several dimensions of quality of service such as loss tolerant versus guaranteed delivery, low latency versus high throughput, and priority based queuing * Fine-grained data provenance for all data received, forked, joined, cloned, modified, sent, and ultimately dropped as data reaches its configured end-state * Component-based extension model along well defined interfaces enabling rapid development and effective testing == Background == Reliable and effective dataflow between systems can be difficult whether you're running scripts on a laptop or have a massive distributed computing system operated by numerous teams and organizations. As the volume and rate of data grows and as the number of systems, protocols, and formats increase and evolve so too does the complexity and need for greater insight and agility. These are the dataflow challenges that NiFi was built to tackle. NiFi is designed in a manner consistent with the core concepts described in flow-based programming as originally documented by J. Paul Morrison in the 1970s. This model lends itself well to visual diagramming, concurrency, componentization, testing, and reuse. In addition to staying close to the fundamentals of flow-based programming, NiFi provides integration system specific features such as: guaranteed delivery; back pressure; ability to gracefully handle backlogs and data surges; and an operator interface that enables on-the-fly data flow generation, modification, and observation. == Rationale == NiFi provides a reliable, scalable, manageable and accountable platform for developers and technical staff to create and evolve powerful data flows. Such a system is useful in many contexts including large-scale enterprise integration, interaction with cloud services and frameworks, business to business, intra-departmental, and inter-departmental flows. NiFi fits well within the Apache Software Foundation (ASF) family as it depends on numerous ASF projects and integrates with several others. We also anticipate developing extensions for several other ASF projects such as Cassandra, Kafka, and Storm in the near future. == Initial Goals == * Ensure all dependencies are compliant with Apache License version 2.0 and all that all code and documentation artifacts have the correct Apache licensing markings and notice. * Establish a formal release process and schedule, allowing for dependable release cycles in a manner consistent with the Apache development process. * Determine and establish a mechanism, possibly including a sub-project construct, that allows for extensions to the core application to occur at a pace that differs from the core application itself. == Current Status == === Meritocracy === An integration platform is only as good as its ability to integrate systems in a reliable, timely, and repeatable manner. The same can be said of its ability to attract talent and a variety of perspectives as integration systems by their nature are always evolving. We will actively seek help and encourage promotion of influence in the project through meritocracy. === Community === Over the past several years, NiFi has developed a strong community of both developers and operators within the U.S. government. We look forward to helping grow this to a broader base of industries. === Core Developers === The initial core developers are employed by the National Security Agency and defense contractors. We will work to grow the community among a more diverse set of developers and industries. === Alignment === From its inception, NiFi was developed with an open source philosophy in mind and with the hopes of eventually being truly open sourced. The Apache way is consistent with the approach we have taken to date. The ASF clearly provides a mature and effective environment for successful development as is evident across the spectrum of well-known projects. Further, NiFi depends on numerous ASF libraries and projects including;
Re: [PROPOSAL] NiFi for Incubation
Hello Thank you for all the feedback thus far. Sean, Jan I, I've adjusted the proposal for the goals, community, and documentation. Thanks Joe On Thu, Nov 20, 2014 at 1:50 AM, Sean Busbey bus...@cloudera.com wrote: I'm really excited to see NiFi come to the incubator; it'd be a great addition to the ASF. A few points in the proposal: == Initial Goals == One of these should be to grow the community outside of the current niche, IMHO. More on this below under orphaned projects * Determine and establish a mechanism, possibly including a sub-project construct, that allows for extensions to the core application to occur at a pace that differs from the core application itself. I don't think the proposal needs to include the e.g. with sub-projects part. Just noting that your goals in the incubator are to address the need to have different release cycles for core and extensions is sufficient. === Community === Over the past several years, NiFi has developed a strong community of both developers and operators within the U.S. government. We look forward to helping grow this to a broader base of industries. How much, if any, of this community do you expect to engage via the customary project lists once NiFi is established within the ASF? Will the project be able to leverage this established group? === Orphaned Products === Risk of orphaning is minimal. The project user and developer base is substantial, growing, and there is already extensive operational use of NiFi. Given that the established base is internal to the U.S. government, I'd encourage the podling to consider the risk of a bifurcated project should a substantial outside community fail to emerge or if those internal users should fail to engage with the outside community. You cover a related issue in your Homogenous Developers section. But I think building on the Community section of the current state to call this out as an independent issue is worthwhile. possible. This environment includes widely accessible source code repositories, published artifacts, ticket tracking, and extensive documentation. We also encourage contributions and frequent debate and hold regular, collaborative discussions through e-mail, chat rooms, and in-person meet-ups. Do you anticipate any difficulties moving these established communication mechanisms to ASF public lists? === Documentation === At this time there is no NiFi documentation on the web. However, we have extensive documentation included within the application that details usage of the many functions. We will be rapidly expanding the available documentation to cover things like installation, developer guide, frequently asked questions, best practices, and more. This documentation will be posted to the NiFi wiki at apache.org. I love projects that start with documentation. :) I don't think the proposal needs to include that the documentation will be posted to the NiFi wiki, since that's an implementation detail. Just say this documentation will be made available via the NiFi project's use of incubator infra. (I'll save detail for the eventual dev@ list, but you should strongly consider not using the wiki to host this documentation.) -Sean On Wed, Nov 19, 2014 at 11:27 PM, Brock Noland br...@cloudera.com wrote: Hi Joe, I know you've done a tremendous amount of work to make this happen so I am extremely happy this is *finally* making it's way to the incubator! I look forward to helping in anyway I can. Cheers! Brock On Wed, Nov 19, 2014 at 8:11 PM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: This is *fan freakin¹ tastic* Sounds like an awesome project and glad to hear a relationship to Tika! Awesome to see more government projects coming into the ASF! you already have a great set of mentors and I don¹t really have more time on my plate, but really happy and will try and monitor and help on the lists. Cheers! Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Joe Witt joe.w...@gmail.com Reply-To: general@incubator.apache.org general@incubator.apache.org Date: Thursday, November 20, 2014 at 3:02 AM To: general@incubator.apache.org general@incubator.apache.org Subject: [PROPOSAL] NiFi for
Re: [PROPOSAL] NiFi for Incubation
very, VERY cool! On Nov 19, 2014, at 9:02 PM, Joe Witt joe.w...@gmail.com wrote: Hello, I would like to propose NiFi as an Apache Incubator Project. In addition to the copy provided below the Wiki version of the proposal can be found here: http://wiki.apache.org/incubator/NiFiProposal Thanks - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] NiFi for Incubation
Hadrian Yes there is a Software Grant Agreement. NSA's tech transfer folks have already sent that to Apache. Given that we are coming from a closed source environment you won't find much. That is what this proposal is about though as we're working hard to change that. The github link you reference has no relationship to this project. The relationship to Apache Camel will need to be explored further as NiFi is often used in similar problem spaces (integration). Camel is really powerful in its core purpose and has an excellent community and a great deal of maturity. NiFi provides a complete dataflow application with a major focus on the user experience, graphical creation and real-time command and control of those flows. It will be interesting as we progress to see how we can best integrate with projects like Camel and I am looking forward to hearing some of the thoughts and ideas the community comes up. Thanks Joe On Thu, Nov 20, 2014 at 7:45 AM, Hadrian Zbarcea hzbar...@gmail.com wrote: Sounds exciting. I have a couple of questions: 1. Is there a code grant? I assume so, the proposal states that the project is active since 2006. What I could find [1] doesn't seem to be it. 2. What is the overlap with Apache Camel (if any)? Cheers, Hadrian [1] https://github.com/Nifi On 11/19/2014 09:02 PM, Joe Witt wrote: Hello, I would like to propose NiFi as an Apache Incubator Project. In addition to the copy provided below the Wiki version of the proposal can be found here: http://wiki.apache.org/incubator/NiFiProposal Thanks Joe = NiFi Proposal = == Abstract == NiFi is a dataflow system based on the concepts of flow-based programming. == Proposal == NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Some of the high-level capabilities and objectives of NiFi include: * Web-based user interface for seamless experience between design, control, feedback, and monitoring of data flows * Highly configurable along several dimensions of quality of service such as loss tolerant versus guaranteed delivery, low latency versus high throughput, and priority based queuing * Fine-grained data provenance for all data received, forked, joined, cloned, modified, sent, and ultimately dropped as data reaches its configured end-state * Component-based extension model along well defined interfaces enabling rapid development and effective testing == Background == Reliable and effective dataflow between systems can be difficult whether you're running scripts on a laptop or have a massive distributed computing system operated by numerous teams and organizations. As the volume and rate of data grows and as the number of systems, protocols, and formats increase and evolve so too does the complexity and need for greater insight and agility. These are the dataflow challenges that NiFi was built to tackle. NiFi is designed in a manner consistent with the core concepts described in flow-based programming as originally documented by J. Paul Morrison in the 1970s. This model lends itself well to visual diagramming, concurrency, componentization, testing, and reuse. In addition to staying close to the fundamentals of flow-based programming, NiFi provides integration system specific features such as: guaranteed delivery; back pressure; ability to gracefully handle backlogs and data surges; and an operator interface that enables on-the-fly data flow generation, modification, and observation. == Rationale == NiFi provides a reliable, scalable, manageable and accountable platform for developers and technical staff to create and evolve powerful data flows. Such a system is useful in many contexts including large-scale enterprise integration, interaction with cloud services and frameworks, business to business, intra-departmental, and inter-departmental flows. NiFi fits well within the Apache Software Foundation (ASF) family as it depends on numerous ASF projects and integrates with several others. We also anticipate developing extensions for several other ASF projects such as Cassandra, Kafka, and Storm in the near future. == Initial Goals == * Ensure all dependencies are compliant with Apache License version 2.0 and all that all code and documentation artifacts have the correct Apache licensing markings and notice. * Establish a formal release process and schedule, allowing for dependable release cycles in a manner consistent with the Apache development process. * Determine and establish a mechanism, possibly including a sub-project construct, that allows for extensions to the core application to occur at a pace that differs from the core application itself. == Current Status == === Meritocracy === An integration platform is only as good as its ability to integrate systems in a reliable, timely, and repeatable manner. The same can be said of its ability to
Re: [PROPOSAL] NiFi for Incubation
+1, good stuff... --tim On Wed, Nov 19, 2014 at 9:02 PM, Joe Witt joe.w...@gmail.com wrote: Hello, I would like to propose NiFi as an Apache Incubator Project. In addition to the copy provided below the Wiki version of the proposal can be found here: http://wiki.apache.org/incubator/NiFiProposal Thanks Joe = NiFi Proposal = == Abstract == NiFi is a dataflow system based on the concepts of flow-based programming. == Proposal == NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Some of the high-level capabilities and objectives of NiFi include: * Web-based user interface for seamless experience between design, control, feedback, and monitoring of data flows * Highly configurable along several dimensions of quality of service such as loss tolerant versus guaranteed delivery, low latency versus high throughput, and priority based queuing * Fine-grained data provenance for all data received, forked, joined, cloned, modified, sent, and ultimately dropped as data reaches its configured end-state * Component-based extension model along well defined interfaces enabling rapid development and effective testing == Background == Reliable and effective dataflow between systems can be difficult whether you're running scripts on a laptop or have a massive distributed computing system operated by numerous teams and organizations. As the volume and rate of data grows and as the number of systems, protocols, and formats increase and evolve so too does the complexity and need for greater insight and agility. These are the dataflow challenges that NiFi was built to tackle. NiFi is designed in a manner consistent with the core concepts described in flow-based programming as originally documented by J. Paul Morrison in the 1970s. This model lends itself well to visual diagramming, concurrency, componentization, testing, and reuse. In addition to staying close to the fundamentals of flow-based programming, NiFi provides integration system specific features such as: guaranteed delivery; back pressure; ability to gracefully handle backlogs and data surges; and an operator interface that enables on-the-fly data flow generation, modification, and observation. == Rationale == NiFi provides a reliable, scalable, manageable and accountable platform for developers and technical staff to create and evolve powerful data flows. Such a system is useful in many contexts including large-scale enterprise integration, interaction with cloud services and frameworks, business to business, intra-departmental, and inter-departmental flows. NiFi fits well within the Apache Software Foundation (ASF) family as it depends on numerous ASF projects and integrates with several others. We also anticipate developing extensions for several other ASF projects such as Cassandra, Kafka, and Storm in the near future. == Initial Goals == * Ensure all dependencies are compliant with Apache License version 2.0 and all that all code and documentation artifacts have the correct Apache licensing markings and notice. * Establish a formal release process and schedule, allowing for dependable release cycles in a manner consistent with the Apache development process. * Determine and establish a mechanism, possibly including a sub-project construct, that allows for extensions to the core application to occur at a pace that differs from the core application itself. == Current Status == === Meritocracy === An integration platform is only as good as its ability to integrate systems in a reliable, timely, and repeatable manner. The same can be said of its ability to attract talent and a variety of perspectives as integration systems by their nature are always evolving. We will actively seek help and encourage promotion of influence in the project through meritocracy. === Community === Over the past several years, NiFi has developed a strong community of both developers and operators within the U.S. government. We look forward to helping grow this to a broader base of industries. === Core Developers === The initial core developers are employed by the National Security Agency and defense contractors. We will work to grow the community among a more diverse set of developers and industries. === Alignment === From its inception, NiFi was developed with an open source philosophy in mind and with the hopes of eventually being truly open sourced. The Apache way is consistent with the approach we have taken to date. The ASF clearly provides a mature and effective environment for successful development as is evident across the spectrum of well-known projects. Further, NiFi depends on numerous ASF libraries and projects including; ActiveMQ, Ant, Commons, Lucene, Hadoop, HttpClient, Jakarta and Maven. We also anticipate extensions and dependencies with several more ASF projects, including
Re: [PROPOSAL] NiFi for Incubation
Very exciting stuff! Not presently on IPMC, but if you'd have me, I'd be happy to volunteer as a mentor. If so, I'll submit an application to join the IPMC and we can go from there. - Josh Joe Witt wrote: Hello, I would like to propose NiFi as an Apache Incubator Project. In addition to the copy provided below the Wiki version of the proposal can be found here: http://wiki.apache.org/incubator/NiFiProposal Thanks Joe = NiFi Proposal = == Abstract == NiFi is a dataflow system based on the concepts of flow-based programming. == Proposal == NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Some of the high-level capabilities and objectives of NiFi include: * Web-based user interface for seamless experience between design, control, feedback, and monitoring of data flows * Highly configurable along several dimensions of quality of service such as loss tolerant versus guaranteed delivery, low latency versus high throughput, and priority based queuing * Fine-grained data provenance for all data received, forked, joined, cloned, modified, sent, and ultimately dropped as data reaches its configured end-state * Component-based extension model along well defined interfaces enabling rapid development and effective testing == Background == Reliable and effective dataflow between systems can be difficult whether you're running scripts on a laptop or have a massive distributed computing system operated by numerous teams and organizations. As the volume and rate of data grows and as the number of systems, protocols, and formats increase and evolve so too does the complexity and need for greater insight and agility. These are the dataflow challenges that NiFi was built to tackle. NiFi is designed in a manner consistent with the core concepts described in flow-based programming as originally documented by J. Paul Morrison in the 1970s. This model lends itself well to visual diagramming, concurrency, componentization, testing, and reuse. In addition to staying close to the fundamentals of flow-based programming, NiFi provides integration system specific features such as: guaranteed delivery; back pressure; ability to gracefully handle backlogs and data surges; and an operator interface that enables on-the-fly data flow generation, modification, and observation. == Rationale == NiFi provides a reliable, scalable, manageable and accountable platform for developers and technical staff to create and evolve powerful data flows. Such a system is useful in many contexts including large-scale enterprise integration, interaction with cloud services and frameworks, business to business, intra-departmental, and inter-departmental flows. NiFi fits well within the Apache Software Foundation (ASF) family as it depends on numerous ASF projects and integrates with several others. We also anticipate developing extensions for several other ASF projects such as Cassandra, Kafka, and Storm in the near future. == Initial Goals == * Ensure all dependencies are compliant with Apache License version 2.0 and all that all code and documentation artifacts have the correct Apache licensing markings and notice. * Establish a formal release process and schedule, allowing for dependable release cycles in a manner consistent with the Apache development process. * Determine and establish a mechanism, possibly including a sub-project construct, that allows for extensions to the core application to occur at a pace that differs from the core application itself. == Current Status == === Meritocracy === An integration platform is only as good as its ability to integrate systems in a reliable, timely, and repeatable manner. The same can be said of its ability to attract talent and a variety of perspectives as integration systems by their nature are always evolving. We will actively seek help and encourage promotion of influence in the project through meritocracy. === Community === Over the past several years, NiFi has developed a strong community of both developers and operators within the U.S. government. We look forward to helping grow this to a broader base of industries. === Core Developers === The initial core developers are employed by the National Security Agency and defense contractors. We will work to grow the community among a more diverse set of developers and industries. === Alignment === From its inception, NiFi was developed with an open source philosophy in mind and with the hopes of eventually being truly open sourced. The Apache way is consistent with the approach we have taken to date. The ASF clearly provides a mature and effective environment for successful development as is evident across the spectrum of well-known projects. Further, NiFi depends on numerous ASF libraries and projects including; ActiveMQ, Ant, Commons, Lucene, Hadoop, HttpClient, Jakarta and Maven. We also anticipate extensions and dependencies with
Re: [VOTE] (new) Release Apache Metamodel incubating 4.3.0
+1 (binding) On Wed, Nov 19, 2014 at 2:10 PM, Kasper Sørensen kasper.soren...@humaninference.com wrote: Hi All, The previous vote on this subject was cancelled because of a misstep in the artifact signing procedure. Now we're back with a properly signed release (based on the same source code). Please vote on releasing the following candidate as Apache MetaModel version 4.3.0-incubating. The Git tag to be voted on is v4.3.0- incubating tag: https://git-wip-us.apache.org/repos/asf?p=incubator-metamodel.git;a=tag;h=refs/tags/MetaModel-4.3.0-incubating commit: https://git-wip-us.apache.org/repos/asf?p=incubator-metamodel.git;a=commit;h=eef82fb039e819b8841c55e393898260733a545b The source artifact to be voted on is: https://repository.apache.org/content/repositories/orgapachemetamodel-1004/org/apache/metamodel/MetaModel/4.3.0-incubating/MetaModel-4.3.0-incubating-source-release.zip Parent directory (including MD5, SHA1 hashes etc.) of the source is: https://repository.apache.org/content/repositories/orgapachemetamodel-1004/org/apache/metamodel/MetaModel/4.3.0-incubating Release artifacts are signed with the following key: https://people.apache.org/keys/committer/kaspersor.asc Release engineer public key id: 1FE1C2F5 Vote thread link from d...@metamodel.incubator.apache.org mailing list: http://markmail.org/thread/cksfunp5oiihbag2 Result thread link from d...@metamodel.incubator.apache.org mailing list: http://markmail.org/message/fc4adybhue6t2jay Please vote on releasing this package as Apache MetaModel 4.3.0- incubating. The vote is open for 72 hours, or until we get the needed number of votes (3 times +1). [ ] +1 Release this package as Apache MetaModel 4.3.0 -incubating [ ] -1 Do not release this package because ... More information about the MetaModel project can be found at http://metamodel.incubator.apache.org/ Thank you in advance for participating. Regards, Kasper Sørensen - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[VOTE] Accept Kylin into the Apache Incubator
Following the discussion earlier in the thread: http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e I would like to call a VOTE for accepting Kylin as a new incubator project. The proposal is available at: https://wiki.apache.org/incubator/KylinProposal and posted the text of the proposal below also. Vote is open until 24th November 2014, 23:59:00 UTC [ ] +1 accept Kylin in the Incubator [ ] ±0 [ ] -1 because... Thanks Luke Kylin Proposal == # Abstract Kylin is a distributed and scalable OLAP engine built on Hadoop to support extremely large datasets. # Proposal Kylin is an open source Distributed Analytics Engine that provides multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to accelerate analytics on Hadoop by allowing the use of SQL-compatible tools. Kylin provides a SQL interface and multi-dimensional analysis (MOLAP) on Hadoop to support extremely large datasets and tightly integrate with Hadoop ecosystem. ## Overview of Kylin Kylin platform has two parts of data processing and interactive: First, Kylin will read data from source, Hive, and run a set of tasks including Map Reduce job, shell script to pre-calcuate results for a specified data model, then save the resulting OLAP cube into storage such as HBase. Once these OLAP cubes are ready, a user can submit a request from any SQL-based tool or third party applications to Kylin’s REST server. The Server calls the Query Engine to determine if the target dataset already exists. If so, the engine directly accesses the target data in the form of a predefined cube, and returns the result with sub-second latency. Otherwise, the engine is designed to route non-matching queries to whichever SQL on Hadoop tool is already available on a Hadoop cluster, such as Hive. Kylin platform includes: - Metadata Manager: Kylin is a metadata-driven application. The Kylin Metadata Manager is the key component that manages all metadata stored in Kylin including all cube metadata. All other components rely on the Metadata Manager. - Job Engine: This engine is designed to handle all of the offline jobs including shell script, Java API, and Map Reduce jobs. The Job Engine manages and coordinates all of the jobs in Kylin to make sure each job executes and handles failures. - Storage Engine: This engine manages the underlying storage – specifically, the cuboids, which are stored as key-value pairs. The Storage Engine uses HBase – the best solution from the Hadoop ecosystem for leveraging an existing K-V system. Kylin can also be extended to support other K-V systems, such as Redis. - Query Engine: Once the cube is ready, the Query Engine can receive and parse user queries. It then interacts with other components to return the results to the user. - REST Server: The REST Server is an entry point for applications to develop against Kylin. Applications can submit queries, get results, trigger cube build jobs, get metadata, get user privileges, and so on. - ODBC Driver: To support third-party tools and applications – such as Tableau – we have built and open-sourced an ODBC Driver. The goal is to make it easy for users to onboard. # Background The challenge we face at eBay is that our data volume is becoming bigger and bigger while our user base is becoming more diverse. For e.g. our business users and analysts consistently ask for minimal latency when visualizing data on Tableau and Excel. So, we worked closely with our internal analyst community and outlined the product requirements for Kylin: - Sub-second query latency on billions of rows - ANSI SQL availability for those using SQL-compatible tools - Full OLAP capability to offer advanced functionality - Support for high cardinality and very large dimensions - High concurrency for thousands of users - Distributed and scale-out architecture for analysis in the TB to PB size range Existing SQL-on-Hadoop solutions commonly need to perform partial or full table or file scans to compute the results of queries. The cost of these large data scans can make many queries very slow (more than a minute). The core idea of MOLAP (multi-dimensional OLAP) is to pre-compute data along dimensions of interest and store resulting aggregates as a cube. MOLAP is much faster but is inflexible. We realized that no existing product met our exact requirements externally – especially in the open source Hadoop community. To meet our emerging business needs, we built a platform from scratch to support MOLAP for these business requirements and then to support more others include ROLAP. With an excellent development team and several pilot customers, we have been able to bring the Kylin platform into production as well as open source it. # Rationale When data grows to petabyte scale, the process of pre-calculation of a query takes a long time and costly and powerful hardware. However, with the benefit of Hadoop’s
Re: [PROPOSAL] NiFi for Incubation
On 20 November 2014 14:05, Joe Witt joe.w...@gmail.com wrote: Hadrian Yes there is a Software Grant Agreement. NSA's tech transfer folks have already sent that to Apache. Given that we are coming from a closed source environment you won't find much. That is what this proposal is about though as we're working hard to change that. The github link you reference has no relationship to this project. The relationship to Apache Camel will need to be explored further as NiFi is often used in similar problem spaces (integration). Camel is really powerful in its core purpose and has an excellent community and a great deal of maturity. NiFi provides a complete dataflow application with a major focus on the user experience, graphical creation and real-time command and control of those flows. It will be interesting as we progress to see how we can best integrate with projects like Camel and I am looking forward to hearing some of the thoughts and ideas the community comes up. Thanks for the explanation, but just to be sure, similar/overlapping projects is not a problem per se, the only real concern is if 2 communities can grow. rgds jan i. Thanks Joe On Thu, Nov 20, 2014 at 7:45 AM, Hadrian Zbarcea hzbar...@gmail.com wrote: Sounds exciting. I have a couple of questions: 1. Is there a code grant? I assume so, the proposal states that the project is active since 2006. What I could find [1] doesn't seem to be it. 2. What is the overlap with Apache Camel (if any)? Cheers, Hadrian [1] https://github.com/Nifi On 11/19/2014 09:02 PM, Joe Witt wrote: Hello, I would like to propose NiFi as an Apache Incubator Project. In addition to the copy provided below the Wiki version of the proposal can be found here: http://wiki.apache.org/incubator/NiFiProposal Thanks Joe = NiFi Proposal = == Abstract == NiFi is a dataflow system based on the concepts of flow-based programming. == Proposal == NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Some of the high-level capabilities and objectives of NiFi include: * Web-based user interface for seamless experience between design, control, feedback, and monitoring of data flows * Highly configurable along several dimensions of quality of service such as loss tolerant versus guaranteed delivery, low latency versus high throughput, and priority based queuing * Fine-grained data provenance for all data received, forked, joined, cloned, modified, sent, and ultimately dropped as data reaches its configured end-state * Component-based extension model along well defined interfaces enabling rapid development and effective testing == Background == Reliable and effective dataflow between systems can be difficult whether you're running scripts on a laptop or have a massive distributed computing system operated by numerous teams and organizations. As the volume and rate of data grows and as the number of systems, protocols, and formats increase and evolve so too does the complexity and need for greater insight and agility. These are the dataflow challenges that NiFi was built to tackle. NiFi is designed in a manner consistent with the core concepts described in flow-based programming as originally documented by J. Paul Morrison in the 1970s. This model lends itself well to visual diagramming, concurrency, componentization, testing, and reuse. In addition to staying close to the fundamentals of flow-based programming, NiFi provides integration system specific features such as: guaranteed delivery; back pressure; ability to gracefully handle backlogs and data surges; and an operator interface that enables on-the-fly data flow generation, modification, and observation. == Rationale == NiFi provides a reliable, scalable, manageable and accountable platform for developers and technical staff to create and evolve powerful data flows. Such a system is useful in many contexts including large-scale enterprise integration, interaction with cloud services and frameworks, business to business, intra-departmental, and inter-departmental flows. NiFi fits well within the Apache Software Foundation (ASF) family as it depends on numerous ASF projects and integrates with several others. We also anticipate developing extensions for several other ASF projects such as Cassandra, Kafka, and Storm in the near future. == Initial Goals == * Ensure all dependencies are compliant with Apache License version 2.0 and all that all code and documentation artifacts have the correct Apache licensing markings and notice. * Establish a formal release process and schedule, allowing for dependable release cycles in a manner consistent with the Apache development process. * Determine and establish a mechanism, possibly including a
Infra for podling setup
Hi, Since I'm new at being a mentor, I was wondering how to handle slow infra requests for podlings? Ideally, I'd like to help out infra with the steps required, as I know some of the members of the podling are anxious to get things going. The infra terms to get things running are a bit loose - e.g. hang out with them. Unfortunately my work blocks IRC ports so it's a pain to keep connected during the day. John
Re: Infra for podling setup
I've just recently dealt with during the incubation for Ignite and looks like the following tactics work the best: - ping Infra on your JIRA tickets once in a while - ping them on IRC #asfinfra channel But in general, be patient - the folks are clearly pretty busy. Regards, Cos On Thu, Nov 20, 2014 at 10:42PM, John D. Ament wrote: Hi, Since I'm new at being a mentor, I was wondering how to handle slow infra requests for podlings? Ideally, I'd like to help out infra with the steps required, as I know some of the members of the podling are anxious to get things going. The infra terms to get things running are a bit loose - e.g. hang out with them. Unfortunately my work blocks IRC ports so it's a pain to keep connected during the day. John - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] NiFi for Incubation
Josh, Really appreciate it and have updated the proposal. Thanks Joe On Thu, Nov 20, 2014 at 9:35 AM, Josh Elser els...@apache.org wrote: Very exciting stuff! Not presently on IPMC, but if you'd have me, I'd be happy to volunteer as a mentor. If so, I'll submit an application to join the IPMC and we can go from there. - Josh Joe Witt wrote: Hello, I would like to propose NiFi as an Apache Incubator Project. In addition to the copy provided below the Wiki version of the proposal can be found here: http://wiki.apache.org/incubator/NiFiProposal Thanks Joe = NiFi Proposal = == Abstract == NiFi is a dataflow system based on the concepts of flow-based programming. == Proposal == NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Some of the high-level capabilities and objectives of NiFi include: * Web-based user interface for seamless experience between design, control, feedback, and monitoring of data flows * Highly configurable along several dimensions of quality of service such as loss tolerant versus guaranteed delivery, low latency versus high throughput, and priority based queuing * Fine-grained data provenance for all data received, forked, joined, cloned, modified, sent, and ultimately dropped as data reaches its configured end-state * Component-based extension model along well defined interfaces enabling rapid development and effective testing == Background == Reliable and effective dataflow between systems can be difficult whether you're running scripts on a laptop or have a massive distributed computing system operated by numerous teams and organizations. As the volume and rate of data grows and as the number of systems, protocols, and formats increase and evolve so too does the complexity and need for greater insight and agility. These are the dataflow challenges that NiFi was built to tackle. NiFi is designed in a manner consistent with the core concepts described in flow-based programming as originally documented by J. Paul Morrison in the 1970s. This model lends itself well to visual diagramming, concurrency, componentization, testing, and reuse. In addition to staying close to the fundamentals of flow-based programming, NiFi provides integration system specific features such as: guaranteed delivery; back pressure; ability to gracefully handle backlogs and data surges; and an operator interface that enables on-the-fly data flow generation, modification, and observation. == Rationale == NiFi provides a reliable, scalable, manageable and accountable platform for developers and technical staff to create and evolve powerful data flows. Such a system is useful in many contexts including large-scale enterprise integration, interaction with cloud services and frameworks, business to business, intra-departmental, and inter-departmental flows. NiFi fits well within the Apache Software Foundation (ASF) family as it depends on numerous ASF projects and integrates with several others. We also anticipate developing extensions for several other ASF projects such as Cassandra, Kafka, and Storm in the near future. == Initial Goals == * Ensure all dependencies are compliant with Apache License version 2.0 and all that all code and documentation artifacts have the correct Apache licensing markings and notice. * Establish a formal release process and schedule, allowing for dependable release cycles in a manner consistent with the Apache development process. * Determine and establish a mechanism, possibly including a sub-project construct, that allows for extensions to the core application to occur at a pace that differs from the core application itself. == Current Status == === Meritocracy === An integration platform is only as good as its ability to integrate systems in a reliable, timely, and repeatable manner. The same can be said of its ability to attract talent and a variety of perspectives as integration systems by their nature are always evolving. We will actively seek help and encourage promotion of influence in the project through meritocracy. === Community === Over the past several years, NiFi has developed a strong community of both developers and operators within the U.S. government. We look forward to helping grow this to a broader base of industries. === Core Developers === The initial core developers are employed by the National Security Agency and defense contractors. We will work to grow the community among a more diverse set of developers and industries. === Alignment === From its inception, NiFi was developed with an open source philosophy in mind and with the hopes of eventually being truly open sourced. The Apache way is consistent with the approach we have taken to date. The ASF clearly provides a mature and effective environment for successful development as is evident across the
Re: Infra for podling setup
Jake, Thanks for looking. I'll have to get onto hipchat, probably the webclient will work fine for me. On Thu Nov 20 2014 at 9:37:13 PM Jake Farrell jfarr...@apache.org wrote: Hi John, what is the infra ticket you are having an issue with? We also moved away from using irc to hipchat [1] for infra communication -Jake [1]: http://www.hipchat.com/gdAiIcNyE On Thu, Nov 20, 2014 at 5:42 PM, John D. Ament john.d.am...@gmail.com wrote: Hi, Since I'm new at being a mentor, I was wondering how to handle slow infra requests for podlings? Ideally, I'd like to help out infra with the steps required, as I know some of the members of the podling are anxious to get things going. The infra terms to get things running are a bit loose - e.g. hang out with them. Unfortunately my work blocks IRC ports so it's a pain to keep connected during the day. John
Re: [VOTE] Accept Kylin into the Apache Incubator
+1 (binding) On Fri, Nov 21, 2014 at 3:37 AM, Andrew Purtell apurt...@apache.org wrote: +1 (binding) On Thu, Nov 20, 2014 at 2:31 PM, Luke Han luke...@gmail.com wrote: Following the discussion earlier in the thread: http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e I would like to call a VOTE for accepting Kylin as a new incubator project. The proposal is available at: https://wiki.apache.org/incubator/KylinProposal and posted the text of the proposal below also. Vote is open until 24th November 2014, 23:59:00 UTC [ ] +1 accept Kylin in the Incubator [ ] ±0 [ ] -1 because... Thanks Luke Kylin Proposal == # Abstract Kylin is a distributed and scalable OLAP engine built on Hadoop to support extremely large datasets. # Proposal Kylin is an open source Distributed Analytics Engine that provides multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to accelerate analytics on Hadoop by allowing the use of SQL-compatible tools. Kylin provides a SQL interface and multi-dimensional analysis (MOLAP) on Hadoop to support extremely large datasets and tightly integrate with Hadoop ecosystem. ## Overview of Kylin Kylin platform has two parts of data processing and interactive: First, Kylin will read data from source, Hive, and run a set of tasks including Map Reduce job, shell script to pre-calcuate results for a specified data model, then save the resulting OLAP cube into storage such as HBase. Once these OLAP cubes are ready, a user can submit a request from any SQL-based tool or third party applications to Kylin’s REST server. The Server calls the Query Engine to determine if the target dataset already exists. If so, the engine directly accesses the target data in the form of a predefined cube, and returns the result with sub-second latency. Otherwise, the engine is designed to route non-matching queries to whichever SQL on Hadoop tool is already available on a Hadoop cluster, such as Hive. Kylin platform includes: - Metadata Manager: Kylin is a metadata-driven application. The Kylin Metadata Manager is the key component that manages all metadata stored in Kylin including all cube metadata. All other components rely on the Metadata Manager. - Job Engine: This engine is designed to handle all of the offline jobs including shell script, Java API, and Map Reduce jobs. The Job Engine manages and coordinates all of the jobs in Kylin to make sure each job executes and handles failures. - Storage Engine: This engine manages the underlying storage – specifically, the cuboids, which are stored as key-value pairs. The Storage Engine uses HBase – the best solution from the Hadoop ecosystem for leveraging an existing K-V system. Kylin can also be extended to support other K-V systems, such as Redis. - Query Engine: Once the cube is ready, the Query Engine can receive and parse user queries. It then interacts with other components to return the results to the user. - REST Server: The REST Server is an entry point for applications to develop against Kylin. Applications can submit queries, get results, trigger cube build jobs, get metadata, get user privileges, and so on. - ODBC Driver: To support third-party tools and applications – such as Tableau – we have built and open-sourced an ODBC Driver. The goal is to make it easy for users to onboard. # Background The challenge we face at eBay is that our data volume is becoming bigger and bigger while our user base is becoming more diverse. For e.g. our business users and analysts consistently ask for minimal latency when visualizing data on Tableau and Excel. So, we worked closely with our internal analyst community and outlined the product requirements for Kylin: - Sub-second query latency on billions of rows - ANSI SQL availability for those using SQL-compatible tools - Full OLAP capability to offer advanced functionality - Support for high cardinality and very large dimensions - High concurrency for thousands of users - Distributed and scale-out architecture for analysis in the TB to PB size range Existing SQL-on-Hadoop solutions commonly need to perform partial or full table or file scans to compute the results of queries. The cost of these large data scans can make many queries very slow (more than a minute). The core idea of MOLAP (multi-dimensional OLAP) is to pre-compute data along dimensions of interest and store resulting aggregates as a cube. MOLAP is much faster but is inflexible. We realized that no existing product met our exact requirements externally – especially in the open source Hadoop community. To meet our emerging business needs, we built a platform from scratch to support MOLAP for these business requirements and