Re: HBase and Datawarehouse

James Taylor Mon, 29 Apr 2013 23:28:52 -0700

Phoenix will succeed if HBase succeeds. Phoenix just makes it easier todrive HBase to it's maximum capability. IMHO, if HBase is to makefurther gains in the OLAP space, scans need to be faster and new, morecompressed columnar-store type block formats need to be developed.

Running inside HBase is what gives Phoenix most of its performanceadvantage. Have you seen our numbers against Impala:https://github.com/forcedotcom/phoenix/wiki/Performance? Drill will needsomething to efficiently execute a query plan against HBase and Phoenixis a good fit here.


Thanks,

James

On 04/29/2013 10:54 PM, Asaf Mesika wrote:

I think for Pheoenix truly to succeed, it's need HBase to break the JVM
Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of
analytics queries utilize memory, thus since its memory is shared with
HBase, there's so much you can do on 12GB heap. On the other hand, if
Pheonix was implemented outside HBase on the same machine (like Drill or
Impala is doing), you can have 60GB for this process, running many OLAP
queries in parallel, utilizing the same data set.



On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell <[email protected]> wrote:

HBase is not really intended for heavy data crunching

Yes it is. This is why we have first class MapReduce integration and
optimized scanners.

Recent versions, like 0.94, also do pretty well with the 'O' part of OLAP.

Urban Airship's Datacube is an example of a successful OLAP project
implemented on HBase: http://github.com/urbanairship/datacube

"Urban Airship uses the datacube project to support its analytics stack for
mobile apps. We handle about ~10K events per second per node."


Also there is Adobe's SaasBase:
http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe

Etc.

Where an HBase OLAP application will differ tremendously from a traditional
data warehouse is of course in the interface to the datastore. You have to
design and speak in the language of the HBase API, though Phoenix (
https://github.com/forcedotcom/phoenix) is changing that.


On Sun, Apr 28, 2013 at 10:21 PM, anil gupta <[email protected]>
wrote:

Hi Kiran,

In HBase the data is denormalized but at the core HBase is KeyValue based
database meant for lookups or queries that expect response in

milliseconds.

OLAP i.e. data warehouse usually involves heavy data crunching. HBase is
not really intended for heavy data crunching. If you want to just store
denoramlized data and do simple queries then HBase is good. For OLAP kind
of stuff, you can make HBase work but IMO you will be better off using

Hive

for  data warehousing.

HTH,
Anil Gupta


On Sun, Apr 28, 2013 at 8:39 PM, Kiran <[email protected]> wrote:

But in HBase data can be said to be in  denormalised state as the
methodology
used for storage is a (column family:column) based flexible schema

.Also,

from Google's  big table paper it is evident that HBase is capable of

doing

OLAP.SO where does the difference lie?



--
View this message in context:

http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172p4043216.html

Sent from the HBase User mailing list archive at Nabble.com.

--
Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: HBase and Datawarehouse

Reply via email to