Re: [VOTE] Accept Blur into the Apache Incubator
On Fri, Jul 20, 2012 at 6:42 PM, Aaron McCurry amccu...@gmail.com wrote: I would like to call a vote for accepting Blur for incubation in the Apache Incubator... +1 -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Blur into the Apache Incubator
+1 (non-binding) 20.07.2012 20:43 пользователь Aaron McCurry amccu...@gmail.com написал: I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the strong relationship with the before mentioned Apache projects, the incubator is a good match for Blur. == Known Risks == === Orphaned Products === There is only a small risk of being orphaned.
Re: [VOTE] Accept Blur into the Apache Incubator
+1 Tom On Fri, Jul 20, 2012 at 12:42 PM, Aaron McCurry amccu...@gmail.com wrote: I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the strong relationship with the before mentioned Apache projects, the incubator is a good match for Blur. == Known Risks == === Orphaned Products === There is only a small risk of being orphaned. The
Re: [VOTE] Accept Blur into the Apache Incubator
[X] +1, bring Blur into Incubator Eric On 07/20/2012 06:42 PM, Aaron McCurry wrote: I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the strong relationship with the before mentioned Apache projects, the incubator is a good match for Blur. == Known Risks == === Orphaned Products === There is only a small risk of being orphaned. The customers that currently use Blur are committed to improving the codebase of the
Re: [VOTE] Accept Blur into the Apache Incubator
+1 Tommaso 2012/7/20 Aaron McCurry amccu...@gmail.com I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the strong relationship with the before mentioned Apache projects, the incubator is a good match for Blur. == Known Risks == === Orphaned Products === There is only a small risk of being orphaned. The customers that currently use
Re: [VOTE] Accept Blur into the Apache Incubator
+1 Doug On Jul 20, 2012 9:43 AM, Aaron McCurry amccu...@gmail.com wrote: I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the strong relationship with the before mentioned Apache projects, the incubator is a good match for Blur. == Known Risks == === Orphaned Products === There is only a small risk of being orphaned. The customers
Re: [VOTE] Accept Blur into the Apache Incubator
+1 On 22 July 2012 14:40, Doug Cutting cutt...@gmail.com wrote: +1 Doug On Jul 20, 2012 9:43 AM, Aaron McCurry amccu...@gmail.com wrote: I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the strong relationship with the before
Re: [VOTE] Accept Blur into the Apache Incubator
+1 On Sun, Jul 22, 2012 at 4:40 PM, Sajeevan Achuthan achuthan.sajee...@gmail.com wrote: +1 On 22 July 2012 14:40, Doug Cutting cutt...@gmail.com wrote: +1 Doug On Jul 20, 2012 9:43 AM, Aaron McCurry amccu...@gmail.com wrote: I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project,
Re: [VOTE] Accept Blur into the Apache Incubator
+1 (binding) -- Olivier Le 20 juil. 2012 18:43, Aaron McCurry amccu...@gmail.com a écrit : I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the strong relationship with the before mentioned Apache projects, the incubator is a good match for Blur. == Known Risks == === Orphaned Products === There is only a small risk of being
Re: [VOTE] Accept Blur into the Apache Incubator
+1 (non-binding) On Sat, Jul 21, 2012 at 4:07 PM, Olivier Lamy ol...@apache.org wrote: +1 (binding) -- Olivier Le 20 juil. 2012 18:43, Aaron McCurry amccu...@gmail.com a écrit : I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the
Re: [VOTE] Accept Blur into the Apache Incubator
+1! - Binding. On Jul 20, 2012, at 9:42 AM, Aaron McCurry wrote: I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the strong relationship with the before mentioned Apache projects, the incubator is a good match for Blur. == Known Risks == === Orphaned Products === There is only a small risk of being orphaned. The
Re: [VOTE] Accept Blur into the Apache Incubator
+1 (non-binding) On Fri, Jul 20, 2012 at 9:48 AM, Dave Fisher dave2w...@comcast.net wrote: +1! - Binding. On Jul 20, 2012, at 9:42 AM, Aaron McCurry wrote: I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the strong relationship with the
Re: [VOTE] Accept Blur into the Apache Incubator
+1 (binding) :) On Friday, July 20, 2012, Aaron McCurry wrote: I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the strong relationship with the before mentioned Apache projects, the incubator is a good match for Blur. == Known Risks == === Orphaned Products === There is only a small risk of being orphaned. The customers that
Re: [VOTE] Accept Blur into the Apache Incubator
Hi, On Fri, Jul 20, 2012 at 7:42 PM, Aaron McCurry amccu...@gmail.com wrote: I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. [x] +1, bring Blur into Incubator BR, Jukka Zitting - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Blur into the Apache Incubator
+1, bring Blur into Incubator On Fri, Jul 20, 2012 at 9:42 AM, Aaron McCurry amccu...@gmail.com wrote: I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the strong relationship with the before mentioned Apache projects, the incubator is a good match for Blur. == Known Risks == === Orphaned Products === There is only a small risk of
Re: [VOTE] Accept Blur into the Apache Incubator
+1 (non-binding) On Fri, Jul 20, 2012 at 10:12 PM, Aaron McCurry amccu...@gmail.com wrote: I would like to call a vote for accepting Blur for incubation in the Apache Incubator. The full proposal is available below. Please cast your vote: [ ] +1, bring Blur into Incubator [ ] +0, I don't care either way, [ ] -1, do not bring Blur into Incubator, because... This vote will be open for 72 hours and only votes from the Incubator PMC are binding. Thank you for your consideration! Aaron http://wiki.apache.org/incubator/BlurProposal = Blur Proposal = == Abstract == Blur is a search platform capable of searching massive amounts of data in a cloud computing environment. Blur leverages several existing Apache projects, including Apache Lucene, Apache Hadoop, Apache !ZooKeeper and Apache Thrift. Both bulk and near real time (NRT) updates are possible with Blur. Bulk updates are accomplished using Hadoop Map/Reduce and NRT are performed through direct Thrift calls. == Proposal == Blur is an open source search platform capable of querying massive amounts of data at incredible speeds. Rather than using the flat, document-like data model used by most search solutions, Blur allows you to build rich data models and search them in a semi-relational manner similar to joins while querying a relational database. Using Blur, you can get precise search results against terabytes of data at Google-like speeds. Blur leverages multiple open source projects including Hadoop, Lucene, Thrift and !ZooKeeper to create an environment where structured data can be transformed into an index that runs on a Hadoop cluster. Blur uses the power of Map/Reduce for bulk indexing into Blur. Server failures are handled automatically by using !ZooKeeper for cluster state and HDFS for index storage. == Background == Blur was created by Aaron !McCurry in 2010. Blur was developed to solve the challenges in dealing with searching huge quantities of data that the traditional RDBMS solutions could not cope with while still providing JOIN-like capabilities to query the data. Several other open source projects have implemented aspects of this design including elasticsearch, Katta and Apache Solr. == Rationale == There is a need for a distributed search capability within the Hadoop ecosystem. Currently, there are no other search solutions that natively leverage HDFS and the failover features of Hadoop in the same manner as the Blur project. The communities we expect to be most interested in such a project are government, health care, and other industries where scalability is a concern. We have made much progress in developing this project over the past 2 years and believe both the project and the interested communities would benefit from this work being openly available and having open development. In future versions of Blur the API will more closely follow the API’s provided in Lucene so that systems that already use Lucene can more easily scale with Blur. Blur can be viewed as a query execution engine that Lucene based solutions can utilize when scale becomes an issue. == Initial Goals == The initial goals of the project are: * To migrate the Blur codebase, issue tracking and wiki from github.com and integrate the project with the ASF infrastructure. * Add new committers to the project and grow the community in The Apache Way. == Current Status == === Meritocracy === Blur was initially developed by Aaron !McCurry in June 2010. Since then Blur has continued to evolve with the support of a small development team at Near Infinity. As a part of the Apache Software Foundation, the Apache Blur team intends to strongly encourage the community to help with and contribute to the project. Apache Blur will actively seek potential committers and help them become familiar with the codebase. === Community === A small community has developed around Blur and several project teams are currently using Blur for their big data search capability. The source code is currently available on GitHub and there is a dedicated website (blur.io) that provides an overview of the project. Blur has been shared with several members of the Apache community and has been presented at the Bay Area HUG (see http://www.meetup.com/hadoop/events/20109471/). === Core Developers === The current developers are employed by Near Infinity Corporation, but we anticipate interest developing among other companies. === Alignment === Blur is built on top of a number of Apache projects; Hadoop, Lucene, !ZooKeeper, and Thrift. It builds with Maven. During the course of Blur development, a couple of patches have been committed back to the Lucene project, including LUCENE-2205 and LUCENE-2215. Due to the strong relationship with the before mentioned Apache projects, the incubator is a good match for Blur. == Known Risks == === Orphaned Products === There is only a small risk of being