So what I’ve been looking for is  a low cost high performance distributed 
relational database. I’ve looked at in memory database but all those guys seem 
to be optimized for a transactional use case. I work in a world where I want to 
deliver real time analytics. I want to be able to hammer the warehouse with 
writes while not disturbing reads. There is one buzz term I didn’t see in here: 
Mulit version concurrency control. 

In the early years of my career, I would design databases without enforcing 
referential integrity leaving that up to the application. Having worked for 
years and seeing what people do to databases, I would be concerned about 
implementing something where a check on users has been removed.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba
Twitter: @BobLovesData

From: Lei Chang 
Sent: Thursday, November 12, 2015 2:25 AM
To: [email protected] 
Subject: Re: what is Hawq?


Hi Bob, 

Apache HAWQ is a Hadoop native SQL query engine that combines the key 
technological advantages of MPP database with the scalability and convenience 
of Hadoop. HAWQ reads data from and writes data to HDFS natively. HAWQ delivers 
industry-leading performance and linear scalability. It provides users the 
tools to confidently and successfully interact with petabyte range data sets. 
HAWQ provides users with a complete, standards compliant SQL interface. More 
specifically, HAWQ has the following features:

  a.. On-premise or cloud deployment 
  b.. Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP extension 
  c.. Extremely high performance. many times faster than other Hadoop SQL 
engine. 
  d.. World-class parallel optimizer 
  e.. Full transaction capability and consistency guarantee: ACID 
  f.. Dynamic data flow engine through high speed UDP based interconnect 
  g.. Elastic execution engine based on virtual segment & data locality 
  h.. Support multiple level partitioning and List/Range based partitioned 
tables. 
  i.. Multiple compression method support: snappy, gzip, quicklz, RLE 
  j.. Multi-language user defined function support: python, perl, java, c/c++, 
R 
  k.. Advanced machine learning and data mining functionalities through MADLib 
  l.. Dynamic node expansion: in seconds 
  m.. Most advanced three level resource management: Integrate with YARN and 
hierarchical resource queues. 
  n.. Easy access of all HDFS data and external system data (for example, 
HBase) 
  o.. Hadoop Native: from storage (HDFS), resource management (YARN) to 
deployment (Ambari). 
  p.. Authentication & Granular authorization: Kerberos, SSL and role based 
access 
  q.. Advanced C/C++ access library to HDFS and YARN: libhdfs3 & libYARN 
  r.. Support most third party tools: Tableau, SAS et al.

  s.. Standard connectivity: JDBC/ODBC

And the link here can give you more information around hawq: 
https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ 



And please also see the answers inline to your specific questions:

On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" Wakefield, MBA 
<[email protected]> wrote:

  Silly question right? Thing is I’ve read a bit and watched some YouTube 
videos and I’m still not quite sure what I can and can’t do with Hawq. Is it a 
true database or is it like Hive where I need to use HCatalog? 

It is a true database, you can think it is like a parallel postgres but with 
much more functionalities and it works natively in hadoop world. HCatalog is 
not necessary. But you can read data registered in HCatalog with the new 
feature "hcatalog integration".

  Can I write data intensive applications against it using ODBC? Does it 
enforce referential integrity? Does it have stored procedures?

ODBC: yes, both JDBC/ODBC are supported
referential integrity: currently not supported.
Stored procedures: yes.

  B.


Please let us know if you have any other questions.

Cheers
Lei

Reply via email to