I see the advantage of HAWQ compared to other Hadoop SQL engines. It looks like the most mature solution on Hadoop thanks to the postgresql based engine.
But why wouldn't I use Greenplum instead of HAWQ? It has even better performance and it supports updates. Cheers 2015-11-13 7:45 GMT+01:00 Atri Sharma <[email protected]>: > +1 for transactions. > > I think a major plus point is that HAWQ supports transactions, and this > enables a lot of critical workloads to be done on HAWQ. > On 13 Nov 2015 12:13, "Lei Chang" <[email protected]> wrote: > >> >> Like what Bob said, HAWQ is a complete database and Drill is just a query >> engine. >> >> And HAWQ has also a lot of other benefits over Drill, for example: >> >> 1. SQL completeness: HAWQ is the best for the sql-on-hadoop engines, can >> run all TPCDS queries without any changes. And support almost all third >> party tools, such as Tableau et al. >> 2. Performance: proved the best in the hadoop world >> 3. Scalability: high scalable via high speed UDP based interconnect. >> 4. Transactions: as I know, drill does not support transactions. it is a >> nightmare for end users to keep consistency. >> 5. Advanced resource management: HAWQ has the most advanced resource >> management. It natively supports YARN and easy to use hierarchical resource >> queues. Resources can be managed and enforced on query and operator level. >> >> Cheers >> Lei >> >> >> On Fri, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" Wakefield, MBA < >> [email protected]> wrote: >> >>> There are a lot of tools that do a lot of things. Believe me it’s a full >>> time job keeping track of what is going on in the apache world. As I >>> understand it, Drill is just a query engine while Hawq is an actual >>> database...some what anyway. >>> >>> Adaryl "Bob" Wakefield, MBA >>> Principal >>> Mass Street Analytics, LLC >>> 913.938.6685 >>> www.linkedin.com/in/bobwakefieldmba >>> Twitter: @BobLovesData >>> >>> *From:* Will Wagner <[email protected]> >>> *Sent:* Thursday, November 12, 2015 7:42 AM >>> *To:* [email protected] >>> *Subject:* Re: what is Hawq? >>> >>> >>> Hi Lie, >>> >>> Great answer. >>> >>> I have a follow up question. >>> Everything HAWQ is capable of doing is already covered by Apache Drill. >>> Why do we need another tool? >>> >>> Thank you, >>> Will W >>> On Nov 12, 2015 12:25 AM, "Lei Chang" <[email protected]> wrote: >>> >>>> >>>> Hi Bob, >>>> >>>> >>>> Apache HAWQ is a Hadoop native SQL query engine that combines the key >>>> technological advantages of MPP database with the scalability and >>>> convenience of Hadoop. HAWQ reads data from and writes data to HDFS >>>> natively. HAWQ delivers industry-leading performance and linear >>>> scalability. It provides users the tools to confidently and successfully >>>> interact with petabyte range data sets. HAWQ provides users with a >>>> complete, standards compliant SQL interface. More specifically, HAWQ has >>>> the following features: >>>> >>>> - On-premise or cloud deployment >>>> - Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP >>>> extension >>>> - Extremely high performance. many times faster than other Hadoop >>>> SQL engine. >>>> - World-class parallel optimizer >>>> - Full transaction capability and consistency guarantee: ACID >>>> - Dynamic data flow engine through high speed UDP based >>>> interconnect >>>> - Elastic execution engine based on virtual segment & data locality >>>> - Support multiple level partitioning and List/Range based >>>> partitioned tables. >>>> - Multiple compression method support: snappy, gzip, quicklz, RLE >>>> - Multi-language user defined function support: python, perl, java, >>>> c/c++, R >>>> - Advanced machine learning and data mining functionalities through >>>> MADLib >>>> - Dynamic node expansion: in seconds >>>> - Most advanced three level resource management: Integrate with >>>> YARN and hierarchical resource queues. >>>> - Easy access of all HDFS data and external system data (for >>>> example, HBase) >>>> - Hadoop Native: from storage (HDFS), resource management (YARN) to >>>> deployment (Ambari). >>>> - Authentication & Granular authorization: Kerberos, SSL and role >>>> based access >>>> - Advanced C/C++ access library to HDFS and YARN: libhdfs3 & libYARN >>>> - Support most third party tools: Tableau, SAS et al. >>>> - Standard connectivity: JDBC/ODBC >>>> >>>> >>>> And the link here can give you more information around hawq: >>>> https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ >>>> >>>> >>>> And please also see the answers inline to your specific questions: >>>> >>>> On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" Wakefield, MBA < >>>> [email protected]> wrote: >>>> >>>>> Silly question right? Thing is I’ve read a bit and watched some >>>>> YouTube videos and I’m still not quite sure what I can and can’t do with >>>>> Hawq. Is it a true database or is it like Hive where I need to use >>>>> HCatalog? >>>>> >>>> >>>> It is a true database, you can think it is like a parallel postgres but >>>> with much more functionalities and it works natively in hadoop world. >>>> HCatalog is not necessary. But you can read data registered in HCatalog >>>> with the new feature "hcatalog integration". >>>> >>>> >>>>> Can I write data intensive applications against it using ODBC? Does it >>>>> enforce referential integrity? Does it have stored procedures? >>>>> >>>> >>>> ODBC: yes, both JDBC/ODBC are supported >>>> referential integrity: currently not supported. >>>> Stored procedures: yes. >>>> >>>> >>>>> B. >>>>> >>>> >>>> >>>> Please let us know if you have any other questions. >>>> >>>> Cheers >>>> Lei >>>> >>>> >>>> >>> >>
