Announcement of Project Panthera: Better Analytics with SQL, MapReduce and HBase

Dai, Jason Mon, 17 Sep 2012 06:56:13 -0700

Hi,

I'd like to announce Project Panthera, our open source efforts that showcase 
better data analytics capabilities on Hadoop/HBase (through both SW and HW 
improvements), available at https://github.com/intel-hadoop/project-panthera.


In the last several years, we have been working closely with many users and 
customers on their next-gen data analytics platforms using the Hadoop stack, 
and specifically using HBase for semi realtime analytics. While the 
Hadoop/HBase stack has laid a solid foundation for these systems, we are still 
required to implement many new capabilities in building a flexible and 
efficient data analytics platform (e.g., better integration with existing 
infrastructure using SQL, better query processing on HBase, and efficiently 
utilizing new hardware platform technologies).

Project Panthera is our open source efforts to contribute these new 
capabilities we have built to the Apache Hadoop community. Under Project 
Panthera, we will make our implementations available at the project repo, 
showcasing these new capabilities; in addition, we will collaborate with the 
Hadoop community (by going through the standard Apache open source process) to 
have some of these ideas reviewed and hopefully incorporated into related 
Apache projects.

In today's first release of Project Panthera, two new capabilities are made 
available for better analytical queries support:

1)      An analytical SQL engine for MapReduce (built on top of Hive)

   Under Project Panthera, we will gradually make our implementation of the SQL 
engine available as an extension to Hive 
(https://github.com/intel-hadoop/hive-0.9-panthera). Specifically, today's 
release provides support for many common SQL constructs used by our users and 
customers, including some important features (e.g., sub-query in WHERE clauses, 
multiple-table SELECT statement, etc.) that are not supported in Hive today. 
Going forward, we will also use 
Hive-3472<https://issues.apache.org/jira/browse/HIVE-3472> as the umbrella JIRA 
to track our efforts to get the SQL engine idea reviewed and hopefully 
incorporated into Apache Hive.

2)      A document store (built on top of HBase) for better query processing
   Under Project Panthera, we will gradually make our implementation of the 
document store available as an extension to HBase 
(https://github.com/intel-hadoop/hbase-0.94-panthera). Specifically, today's 
release provides document store support in HBase by utilizing co-processors, 
which brings up-to 3x reduction in storage usage and up-to 1.8x speedup in 
query processing. Going forward, we will also use 
HBase-6800<https://issues.apache.org/jira/browse/HBASE-6800> as the umbrella 
JIRA to track our efforts to get the document store idea reviewed and hopefully 
incorporated into Apache HBase.

Please refer to our project github repository 
(https://github.com/intel-hadoop/project-panthera) for more details on Project 
Panthera.

Announcement of Project Panthera: Better Analytics with SQL, MapReduce and HBase

Reply via email to