Hello All –

Hive 0.14 supports ACID and also supports transactions. Spark supports Hive 
queries (HQL).

Did anyone compare HAWQ with spark SQL or Hive HQL on Spark?

Thanks.

Supriyo Biswas
Architect – CPS Service Delivery
The Nielsen Company
Office (516) 682-6021/NETS 249-6021
Cell     (516) 353-6795
www.nielsen.com<http://www.nielsen.com/>

From: Atri Sharma [mailto:[email protected]]
Sent: Friday, November 13, 2015 3:53 AM
To: [email protected]
Subject: Re: what is Hawq?


Greenplum is open sourced.

The main difference is between the two engines is that HAWQ is more for Hadoop 
based systems whereas Greenplum is more towards regular FS. This is a very high 
level difference between the two, the differences are more detailed. But a 
single line difference between the two is the one I wrote.
On 13 Nov 2015 14:20, "Adaryl "Bob" Wakefield, MBA" 
<[email protected]<mailto:[email protected]>> wrote:
Is Greenplum free? I heard they open sourced it but I haven’t found anything 
but a community edition.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: dortmont<mailto:[email protected]>
Sent: Friday, November 13, 2015 2:42 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: what is Hawq?

I see the advantage of HAWQ compared to other Hadoop SQL engines. It looks like 
the most mature solution on Hadoop thanks to the postgresql based engine.

But why wouldn't I use Greenplum instead of HAWQ? It has even better 
performance and it supports updates.

Cheers

2015-11-13 7:45 GMT+01:00 Atri Sharma <[email protected]<mailto:[email protected]>>:

+1 for transactions.

I think a major plus point is that HAWQ supports transactions,  and this 
enables a lot of critical workloads to be done on HAWQ.
On 13 Nov 2015 12:13, "Lei Chang" 
<[email protected]<mailto:[email protected]>> wrote:

Like what Bob said, HAWQ is a complete database and Drill is just a query 
engine.

And HAWQ has also a lot of other benefits over Drill, for example:

1. SQL completeness: HAWQ is the best for the sql-on-hadoop engines, can run 
all TPCDS queries without any changes. And support almost all third party 
tools, such as Tableau et al.
2. Performance: proved the best in the hadoop world
3. Scalability: high scalable via high speed UDP based interconnect.
4. Transactions: as I know, drill does not support transactions. it is a 
nightmare for end users to keep consistency.
5. Advanced resource management: HAWQ has the most advanced resource 
management. It natively supports YARN and easy to use hierarchical resource 
queues. Resources can be managed and enforced on query and operator level.

Cheers
Lei


On Fri, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" Wakefield, MBA 
<[email protected]<mailto:[email protected]>> wrote:
There are a lot of tools that do a lot of things. Believe me it’s a full time 
job keeping track of what is going on in the apache world. As I understand it, 
Drill is just a query engine while Hawq is an actual database...some what 
anyway.

Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685<tel:913.938.6685>
www.linkedin.com/in/bobwakefieldmba<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData

From: Will Wagner<mailto:[email protected]>
Sent: Thursday, November 12, 2015 7:42 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: what is Hawq?


Hi Lie,

Great answer.

I have a follow up question.
Everything HAWQ is capable of doing is already covered by Apache Drill.  Why do 
we need another tool?

Thank you,
Will W
On Nov 12, 2015 12:25 AM, "Lei Chang" 
<[email protected]<mailto:[email protected]>> wrote:

Hi Bob,


Apache HAWQ is a Hadoop native SQL query engine that combines the key 
technological advantages of MPP database with the scalability and convenience 
of Hadoop. HAWQ reads data from and writes data to HDFS natively. HAWQ delivers 
industry-leading performance and linear scalability. It provides users the 
tools to confidently and successfully interact with petabyte range data sets. 
HAWQ provides users with a complete, standards compliant SQL interface. More 
specifically, HAWQ has the following features:
·         On-premise or cloud deployment
·         Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP extension
·         Extremely high performance. many times faster than other Hadoop SQL 
engine.
·         World-class parallel optimizer
·         Full transaction capability and consistency guarantee: ACID
·         Dynamic data flow engine through high speed UDP based interconnect
·         Elastic execution engine based on virtual segment & data locality
·         Support multiple level partitioning and List/Range based partitioned 
tables.
·         Multiple compression method support: snappy, gzip, quicklz, RLE
·         Multi-language user defined function support: python, perl, java, 
c/c++, R
·         Advanced machine learning and data mining functionalities through 
MADLib
·         Dynamic node expansion: in seconds
·         Most advanced three level resource management: Integrate with YARN 
and hierarchical resource queues.
·         Easy access of all HDFS data and external system data (for example, 
HBase)
·         Hadoop Native: from storage (HDFS), resource management (YARN) to 
deployment (Ambari).
·         Authentication & Granular authorization: Kerberos, SSL and role based 
access
·         Advanced C/C++ access library to HDFS and YARN: libhdfs3 & libYARN
·         Support most third party tools: Tableau, SAS et al.
·         Standard connectivity: JDBC/ODBC

And the link here can give you more information around hawq: 
https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ


And please also see the answers inline to your specific questions:

On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" Wakefield, MBA 
<[email protected]<mailto:[email protected]>> wrote:
Silly question right? Thing is I’ve read a bit and watched some YouTube videos 
and I’m still not quite sure what I can and can’t do with Hawq. Is it a true 
database or is it like Hive where I need to use HCatalog?

It is a true database, you can think it is like a parallel postgres but with 
much more functionalities and it works natively in hadoop world. HCatalog is 
not necessary. But you can read data registered in HCatalog with the new 
feature "hcatalog integration".

Can I write data intensive applications against it using ODBC? Does it enforce 
referential integrity? Does it have stored procedures?

ODBC: yes, both JDBC/ODBC are supported
referential integrity: currently not supported.
Stored procedures: yes.

B.


Please let us know if you have any other questions.

Cheers
Lei




Reply via email to