Re: what is Hawq?

Caleb Welton Fri, 13 Nov 2015 17:02:02 -0800

This patch is standard in HDFS 2.7.  Pivotal HD and HDP are both based on HDFS 
2.6 with the truncate patch from 2.7 backported.



> On Nov 13, 2015, at 4:45 PM, Dan Baskette <[email protected]> wrote:
> 
> No, truncate was added to Apache Hadoop
> 
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/hdfs-3107
> 
> Sent from my iPhone
> 
>> On Nov 13, 2015, at 7:39 PM, Bob Marshall <[email protected]> 
>> wrote:
>> 
>> I stand corrected. But I had a question:
>> 
>> In Pivotal Hadoop HDFS, we added truncate to support transaction. The 
>> signature of the truncate is as follows. void truncate(Path src, long 
>> length) throws IOException; The truncate() function truncates the file to 
>> the size which is less or equal to the file length. Ift he size of the file 
>> is smaller than the target length, an IOException is thrown.This is 
>> different from Posix truncate semantics. The rationale behind is HDFS does 
>> not support overwriting at any position.
>> 
>> Does this mean I need to run a modified HDFS to run HAWQ?
>> 
>> Robert L Marshall
>> Senior Consultant | Avalon Consulting, LLC
>> c: (210) 853-7041
>> LinkedIn | Google+ | Twitter
>> -------------------------------------------------------------------------------------------------------------
>> This message (including any attachments) contains confidential information 
>> intended for a specific individual and purpose, and is protected by law. If 
>> you are not the intended recipient, you should delete this message. Any 
>> disclosure, copying, or distribution of this message, or the taking of any 
>> action based on it, is strictly prohibited.
>> 
>>> On Fri, Nov 13, 2015 at 7:16 PM, Dan Baskette <[email protected]> wrote:
>>> But HAWQ does manage its own storage on HDFS.  You can leverage native hawq 
>>> format or Parquet.  It's PXF functions allows the querying of files in 
>>> other formats.   So, by your (and my) definition it is indeed a database.  
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Nov 13, 2015, at 7:08 PM, Bob Marshall <[email protected]> 
>>>> wrote:
>>>> 
>>>> Chhavi Joshi is right on the money. A database is both a query execution 
>>>> tool and a data storage backend. HAWQ is executing against native Hadoop 
>>>> storage, i.e. HBase, HDFS, etc.
>>>> 
>>>> Robert L Marshall
>>>> Senior Consultant | Avalon Consulting, LLC
>>>> c: (210) 853-7041
>>>> LinkedIn | Google+ | Twitter
>>>> -------------------------------------------------------------------------------------------------------------
>>>> This message (including any attachments) contains confidential information 
>>>> intended for a specific individual and purpose, and is protected by law. 
>>>> If 
>>>> you are not the intended recipient, you should delete this message. Any 
>>>> disclosure, copying, or distribution of this message, or the taking of any 
>>>> action based on it, is strictly prohibited.
>>>> 
>>>>> On Fri, Nov 13, 2015 at 10:41 AM, Chhavi Joshi 
>>>>> <[email protected]> wrote:
>>>>> If you have HAWQ greenplum integration you can create the external tables 
>>>>> in greenplum like HIVE.
>>>>> 
>>>>> For uploading the data into tables just need to put the file into 
>>>>> hdfs.(same like external tables in HIVE)
>>>>> 
>>>>>  
>>>>> 
>>>>>  
>>>>> 
>>>>> I still believe HAWQ is only the SQL query engine not a database.
>>>>> 
>>>>>  
>>>>> 
>>>>> Chhavi
>>>>> 
>>>>> From: Atri Sharma [mailto:[email protected]] 
>>>>> Sent: Friday, November 13, 2015 3:53 AM
>>>>> 
>>>>> 
>>>>> To: [email protected]
>>>>> Subject: Re: what is Hawq?
>>>>>  
>>>>> 
>>>>> Greenplum is open sourced.
>>>>> 
>>>>> The main difference is between the two engines is that HAWQ is more for 
>>>>> Hadoop based systems whereas Greenplum is more towards regular FS. This 
>>>>> is a very high level difference between the two, the differences are more 
>>>>> detailed. But a single line difference between the two is the one I wrote.
>>>>> 
>>>>> On 13 Nov 2015 14:20, "Adaryl "Bob" Wakefield, MBA" 
>>>>> <[email protected]> wrote:
>>>>> 
>>>>> Is Greenplum free? I heard they open sourced it but I haven’t found 
>>>>> anything but a community edition.
>>>>> 
>>>>>  
>>>>> 
>>>>> Adaryl "Bob" Wakefield, MBA
>>>>> Principal
>>>>> Mass Street Analytics, LLC
>>>>> 913.938.6685
>>>>> www.linkedin.com/in/bobwakefieldmba
>>>>> Twitter: @BobLovesData
>>>>> 
>>>>>  
>>>>> 
>>>>> From: dortmont
>>>>> 
>>>>> Sent: Friday, November 13, 2015 2:42 AM
>>>>> 
>>>>> To: [email protected]
>>>>> 
>>>>> Subject: Re: what is Hawq?
>>>>> 
>>>>>  
>>>>> 
>>>>> I see the advantage of HAWQ compared to other Hadoop SQL engines. It 
>>>>> looks like the most mature solution on Hadoop thanks to the postgresql 
>>>>> based engine.
>>>>> 
>>>>>  
>>>>> 
>>>>> But why wouldn't I use Greenplum instead of HAWQ? It has even better 
>>>>> performance and it supports updates.
>>>>> 
>>>>> 
>>>>> Cheers
>>>>> 
>>>>>  
>>>>> 
>>>>> 2015-11-13 7:45 GMT+01:00 Atri Sharma <[email protected]>:
>>>>> 
>>>>> +1 for transactions.
>>>>> 
>>>>> I think a major plus point is that HAWQ supports transactions,  and this 
>>>>> enables a lot of critical workloads to be done on HAWQ.
>>>>> 
>>>>> On 13 Nov 2015 12:13, "Lei Chang" <[email protected]> wrote:
>>>>> 
>>>>>  
>>>>> 
>>>>> Like what Bob said, HAWQ is a complete database and Drill is just a query 
>>>>> engine.
>>>>> 
>>>>>  
>>>>> 
>>>>> And HAWQ has also a lot of other benefits over Drill, for example:
>>>>> 
>>>>>  
>>>>> 
>>>>> 1. SQL completeness: HAWQ is the best for the sql-on-hadoop engines, can 
>>>>> run all TPCDS queries without any changes. And support almost all third 
>>>>> party tools, such as Tableau et al.
>>>>> 
>>>>> 2. Performance: proved the best in the hadoop world
>>>>> 
>>>>> 3. Scalability: high scalable via high speed UDP based interconnect.
>>>>> 
>>>>> 4. Transactions: as I know, drill does not support transactions. it is a 
>>>>> nightmare for end users to keep consistency.
>>>>> 
>>>>> 5. Advanced resource management: HAWQ has the most advanced resource 
>>>>> management. It natively supports YARN and easy to use hierarchical 
>>>>> resource queues. Resources can be managed and enforced on query and 
>>>>> operator level.
>>>>> 
>>>>>  
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> Lei
>>>>> 
>>>>>  
>>>>> 
>>>>>  
>>>>> 
>>>>> On Fri, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" Wakefield, MBA 
>>>>> <[email protected]> wrote:
>>>>> 
>>>>> There are a lot of tools that do a lot of things. Believe me it’s a full 
>>>>> time job keeping track of what is going on in the apache world. As I 
>>>>> understand it, Drill is just a query engine while Hawq is an actual 
>>>>> database...some what anyway.
>>>>> 
>>>>>  
>>>>> 
>>>>> Adaryl "Bob" Wakefield, MBA
>>>>> Principal
>>>>> Mass Street Analytics, LLC
>>>>> 913.938.6685
>>>>> www.linkedin.com/in/bobwakefieldmba
>>>>> Twitter: @BobLovesData
>>>>> 
>>>>>  
>>>>> 
>>>>> From: Will Wagner
>>>>> 
>>>>> Sent: Thursday, November 12, 2015 7:42 AM
>>>>> 
>>>>> To: [email protected]
>>>>> 
>>>>> Subject: Re: what is Hawq?
>>>>> 
>>>>>  
>>>>> 
>>>>> Hi Lie,
>>>>> 
>>>>> Great answer.
>>>>> 
>>>>> I have a follow up question. 
>>>>> Everything HAWQ is capable of doing is already covered by Apache Drill.  
>>>>> Why do we need another tool?
>>>>> 
>>>>> Thank you, 
>>>>> Will W
>>>>> 
>>>>> On Nov 12, 2015 12:25 AM, "Lei Chang" <[email protected]> wrote:
>>>>> 
>>>>>  
>>>>> 
>>>>> Hi Bob,
>>>>> 
>>>>>  
>>>>> 
>>>>> Apache HAWQ is a Hadoop native SQL query engine that combines the key 
>>>>> technological advantages of MPP database with the scalability and 
>>>>> convenience of Hadoop. HAWQ reads data from and writes data to HDFS 
>>>>> natively. HAWQ delivers industry-leading performance and linear 
>>>>> scalability. It provides users the tools to confidently and successfully 
>>>>> interact with petabyte range data sets. HAWQ provides users with a 
>>>>> complete, standards compliant SQL interface. More specifically, HAWQ has 
>>>>> the following features:
>>>>> ·         On-premise or cloud deployment 
>>>>> 
>>>>> ·         Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP 
>>>>> extension
>>>>> 
>>>>> ·         Extremely high performance. many times faster than other Hadoop 
>>>>> SQL engine.
>>>>> 
>>>>> ·         World-class parallel optimizer
>>>>> 
>>>>> ·         Full transaction capability and consistency guarantee: ACID
>>>>> 
>>>>> ·         Dynamic data flow engine through high speed UDP based 
>>>>> interconnect
>>>>> 
>>>>> ·         Elastic execution engine based on virtual segment & data 
>>>>> locality
>>>>> 
>>>>> ·         Support multiple level partitioning and List/Range based 
>>>>> partitioned tables.
>>>>> 
>>>>> ·         Multiple compression method support: snappy, gzip, quicklz, RLE
>>>>> 
>>>>> ·         Multi-language user defined function support: python, perl, 
>>>>> java, c/c++, R
>>>>> 
>>>>> ·         Advanced machine learning and data mining functionalities 
>>>>> through MADLib
>>>>> 
>>>>> ·         Dynamic node expansion: in seconds
>>>>> 
>>>>> ·         Most advanced three level resource management: Integrate with 
>>>>> YARN and hierarchical resource queues.
>>>>> 
>>>>> ·         Easy access of all HDFS data and external system data (for 
>>>>> example, HBase)
>>>>> 
>>>>> ·         Hadoop Native: from storage (HDFS), resource management (YARN) 
>>>>> to deployment (Ambari).
>>>>> 
>>>>> ·         Authentication & Granular authorization: Kerberos, SSL and role 
>>>>> based access
>>>>> 
>>>>> ·         Advanced C/C++ access library to HDFS and YARN: libhdfs3 & 
>>>>> libYARN
>>>>> 
>>>>> ·         Support most third party tools: Tableau, SAS et al.
>>>>> 
>>>>> ·         Standard connectivity: JDBC/ODBC
>>>>> 
>>>>>  
>>>>> 
>>>>> And the link here can give you more information around hawq: 
>>>>> https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ 
>>>>> 
>>>>>  
>>>>> 
>>>>>  
>>>>> 
>>>>> And please also see the answers inline to your specific questions:
>>>>> 
>>>>>  
>>>>> 
>>>>> On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" Wakefield, MBA 
>>>>> <[email protected]> wrote:
>>>>> 
>>>>> Silly question right? Thing is I’ve read a bit and watched some YouTube 
>>>>> videos and I’m still not quite sure what I can and can’t do with Hawq. Is 
>>>>> it a true database or is it like Hive where I need to use HCatalog?
>>>>> 
>>>>>  
>>>>> 
>>>>> It is a true database, you can think it is like a parallel postgres but 
>>>>> with much more functionalities and it works natively in hadoop world. 
>>>>> HCatalog is not necessary. But you can read data registered in HCatalog 
>>>>> with the new feature "hcatalog integration".
>>>>> 
>>>>>  
>>>>> 
>>>>> Can I write data intensive applications against it using ODBC? Does it 
>>>>> enforce referential integrity? Does it have stored procedures?
>>>>> 
>>>>>  
>>>>> 
>>>>> ODBC: yes, both JDBC/ODBC are supported
>>>>> 
>>>>> referential integrity: currently not supported.
>>>>> 
>>>>> Stored procedures: yes.
>>>>> 
>>>>>  
>>>>> 
>>>>> B.
>>>>> 
>>>>>  
>>>>> 
>>>>>  
>>>>> 
>>>>> Please let us know if you have any other questions.
>>>>> 
>>>>>  
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> Lei
>>>>> 
>>>>>  
>>>>> 
>>>>>  
>>>>> 
>>>>>  
>>>>> 
>>>>>  
>>>>> 
>>>>> ============================================================================================================================
>>>>> Disclaimer: This message and the information contained herein is 
>>>>> proprietary and confidential and subject to the Tech Mahindra policy 
>>>>> statement, you may review the policy at 
>>>>> http://www.techmahindra.com/Disclaimer.html externally 
>>>>> http://tim.techmahindra.com/tim/disclaimer.html internally within 
>>>>> TechMahindra.
>>>>> ============================================================================================================================
>>

Re: what is Hawq?

Reply via email to