Re: what is Hawq?

Dan Baskette Fri, 13 Nov 2015 16:46:43 -0800

No, truncate was added to Apache Hadoop

https://issues.apache.org/jira/plugins/servlet/mobile#issue/hdfs-3107


Sent from my iPhone

> On Nov 13, 2015, at 7:39 PM, Bob Marshall <[email protected]> wrote:
> 
> I stand corrected. But I had a question:
> 
> In Pivotal Hadoop HDFS, we added truncate to support transaction. The 
> signature of the truncate is as follows. void truncate(Path src, long length) 
> throws IOException; The truncate() function truncates the file to the size 
> which is less or equal to the file length. Ift he size of the file is smaller 
> than the target length, an IOException is thrown.This is different from Posix 
> truncate semantics. The rationale behind is HDFS does not support overwriting 
> at any position.
> 
> Does this mean I need to run a modified HDFS to run HAWQ?
> 
> Robert L Marshall
> Senior Consultant | Avalon Consulting, LLC
> c: (210) 853-7041
> LinkedIn | Google+ | Twitter
> -------------------------------------------------------------------------------------------------------------
> This message (including any attachments) contains confidential information 
> intended for a specific individual and purpose, and is protected by law. If 
> you are not the intended recipient, you should delete this message. Any 
> disclosure, copying, or distribution of this message, or the taking of any 
> action based on it, is strictly prohibited.
> 
>> On Fri, Nov 13, 2015 at 7:16 PM, Dan Baskette <[email protected]> wrote:
>> But HAWQ does manage its own storage on HDFS.  You can leverage native hawq 
>> format or Parquet.  It's PXF functions allows the querying of files in other 
>> formats.   So, by your (and my) definition it is indeed a database.  
>> 
>> Sent from my iPhone
>> 
>>> On Nov 13, 2015, at 7:08 PM, Bob Marshall <[email protected]> 
>>> wrote:
>>> 
>>> Chhavi Joshi is right on the money. A database is both a query execution 
>>> tool and a data storage backend. HAWQ is executing against native Hadoop 
>>> storage, i.e. HBase, HDFS, etc.
>>> 
>>> Robert L Marshall
>>> Senior Consultant | Avalon Consulting, LLC
>>> c: (210) 853-7041
>>> LinkedIn | Google+ | Twitter
>>> -------------------------------------------------------------------------------------------------------------
>>> This message (including any attachments) contains confidential information 
>>> intended for a specific individual and purpose, and is protected by law. If 
>>> you are not the intended recipient, you should delete this message. Any 
>>> disclosure, copying, or distribution of this message, or the taking of any 
>>> action based on it, is strictly prohibited.
>>> 
>>>> On Fri, Nov 13, 2015 at 10:41 AM, Chhavi Joshi 
>>>> <[email protected]> wrote:
>>>> If you have HAWQ greenplum integration you can create the external tables 
>>>> in greenplum like HIVE.
>>>> 
>>>> For uploading the data into tables just need to put the file into 
>>>> hdfs.(same like external tables in HIVE)
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>> I still believe HAWQ is only the SQL query engine not a database.
>>>> 
>>>>  
>>>> 
>>>> Chhavi
>>>> 
>>>> From: Atri Sharma [mailto:[email protected]] 
>>>> Sent: Friday, November 13, 2015 3:53 AM
>>>> 
>>>> 
>>>> To: [email protected]
>>>> Subject: Re: what is Hawq?
>>>>  
>>>> 
>>>> Greenplum is open sourced.
>>>> 
>>>> The main difference is between the two engines is that HAWQ is more for 
>>>> Hadoop based systems whereas Greenplum is more towards regular FS. This is 
>>>> a very high level difference between the two, the differences are more 
>>>> detailed. But a single line difference between the two is the one I wrote.
>>>> 
>>>> On 13 Nov 2015 14:20, "Adaryl "Bob" Wakefield, MBA" 
>>>> <[email protected]> wrote:
>>>> 
>>>> Is Greenplum free? I heard they open sourced it but I haven’t found 
>>>> anything but a community edition.
>>>> 
>>>>  
>>>> 
>>>> Adaryl "Bob" Wakefield, MBA
>>>> Principal
>>>> Mass Street Analytics, LLC
>>>> 913.938.6685
>>>> www.linkedin.com/in/bobwakefieldmba
>>>> Twitter: @BobLovesData
>>>> 
>>>>  
>>>> 
>>>> From: dortmont
>>>> 
>>>> Sent: Friday, November 13, 2015 2:42 AM
>>>> 
>>>> To: [email protected]
>>>> 
>>>> Subject: Re: what is Hawq?
>>>> 
>>>>  
>>>> 
>>>> I see the advantage of HAWQ compared to other Hadoop SQL engines. It looks 
>>>> like the most mature solution on Hadoop thanks to the postgresql based 
>>>> engine.
>>>> 
>>>>  
>>>> 
>>>> But why wouldn't I use Greenplum instead of HAWQ? It has even better 
>>>> performance and it supports updates.
>>>> 
>>>> 
>>>> Cheers
>>>> 
>>>>  
>>>> 
>>>> 2015-11-13 7:45 GMT+01:00 Atri Sharma <[email protected]>:
>>>> 
>>>> +1 for transactions.
>>>> 
>>>> I think a major plus point is that HAWQ supports transactions,  and this 
>>>> enables a lot of critical workloads to be done on HAWQ.
>>>> 
>>>> On 13 Nov 2015 12:13, "Lei Chang" <[email protected]> wrote:
>>>> 
>>>>  
>>>> 
>>>> Like what Bob said, HAWQ is a complete database and Drill is just a query 
>>>> engine.
>>>> 
>>>>  
>>>> 
>>>> And HAWQ has also a lot of other benefits over Drill, for example:
>>>> 
>>>>  
>>>> 
>>>> 1. SQL completeness: HAWQ is the best for the sql-on-hadoop engines, can 
>>>> run all TPCDS queries without any changes. And support almost all third 
>>>> party tools, such as Tableau et al.
>>>> 
>>>> 2. Performance: proved the best in the hadoop world
>>>> 
>>>> 3. Scalability: high scalable via high speed UDP based interconnect.
>>>> 
>>>> 4. Transactions: as I know, drill does not support transactions. it is a 
>>>> nightmare for end users to keep consistency.
>>>> 
>>>> 5. Advanced resource management: HAWQ has the most advanced resource 
>>>> management. It natively supports YARN and easy to use hierarchical 
>>>> resource queues. Resources can be managed and enforced on query and 
>>>> operator level.
>>>> 
>>>>  
>>>> 
>>>> Cheers
>>>> 
>>>> Lei
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>> On Fri, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" Wakefield, MBA 
>>>> <[email protected]> wrote:
>>>> 
>>>> There are a lot of tools that do a lot of things. Believe me it’s a full 
>>>> time job keeping track of what is going on in the apache world. As I 
>>>> understand it, Drill is just a query engine while Hawq is an actual 
>>>> database...some what anyway.
>>>> 
>>>>  
>>>> 
>>>> Adaryl "Bob" Wakefield, MBA
>>>> Principal
>>>> Mass Street Analytics, LLC
>>>> 913.938.6685
>>>> www.linkedin.com/in/bobwakefieldmba
>>>> Twitter: @BobLovesData
>>>> 
>>>>  
>>>> 
>>>> From: Will Wagner
>>>> 
>>>> Sent: Thursday, November 12, 2015 7:42 AM
>>>> 
>>>> To: [email protected]
>>>> 
>>>> Subject: Re: what is Hawq?
>>>> 
>>>>  
>>>> 
>>>> Hi Lie,
>>>> 
>>>> Great answer.
>>>> 
>>>> I have a follow up question. 
>>>> Everything HAWQ is capable of doing is already covered by Apache Drill.  
>>>> Why do we need another tool?
>>>> 
>>>> Thank you, 
>>>> Will W
>>>> 
>>>> On Nov 12, 2015 12:25 AM, "Lei Chang" <[email protected]> wrote:
>>>> 
>>>>  
>>>> 
>>>> Hi Bob,
>>>> 
>>>>  
>>>> 
>>>> Apache HAWQ is a Hadoop native SQL query engine that combines the key 
>>>> technological advantages of MPP database with the scalability and 
>>>> convenience of Hadoop. HAWQ reads data from and writes data to HDFS 
>>>> natively. HAWQ delivers industry-leading performance and linear 
>>>> scalability. It provides users the tools to confidently and successfully 
>>>> interact with petabyte range data sets. HAWQ provides users with a 
>>>> complete, standards compliant SQL interface. More specifically, HAWQ has 
>>>> the following features:
>>>> ·         On-premise or cloud deployment
>>>> 
>>>> ·         Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP 
>>>> extension
>>>> 
>>>> ·         Extremely high performance. many times faster than other Hadoop 
>>>> SQL engine.
>>>> 
>>>> ·         World-class parallel optimizer
>>>> 
>>>> ·         Full transaction capability and consistency guarantee: ACID
>>>> 
>>>> ·         Dynamic data flow engine through high speed UDP based 
>>>> interconnect
>>>> 
>>>> ·         Elastic execution engine based on virtual segment & data locality
>>>> 
>>>> ·         Support multiple level partitioning and List/Range based 
>>>> partitioned tables.
>>>> 
>>>> ·         Multiple compression method support: snappy, gzip, quicklz, RLE
>>>> 
>>>> ·         Multi-language user defined function support: python, perl, 
>>>> java, c/c++, R
>>>> 
>>>> ·         Advanced machine learning and data mining functionalities 
>>>> through MADLib
>>>> 
>>>> ·         Dynamic node expansion: in seconds
>>>> 
>>>> ·         Most advanced three level resource management: Integrate with 
>>>> YARN and hierarchical resource queues.
>>>> 
>>>> ·         Easy access of all HDFS data and external system data (for 
>>>> example, HBase)
>>>> 
>>>> ·         Hadoop Native: from storage (HDFS), resource management (YARN) 
>>>> to deployment (Ambari).
>>>> 
>>>> ·         Authentication & Granular authorization: Kerberos, SSL and role 
>>>> based access
>>>> 
>>>> ·         Advanced C/C++ access library to HDFS and YARN: libhdfs3 & 
>>>> libYARN
>>>> 
>>>> ·         Support most third party tools: Tableau, SAS et al.
>>>> 
>>>> ·         Standard connectivity: JDBC/ODBC
>>>> 
>>>>  
>>>> 
>>>> And the link here can give you more information around hawq: 
>>>> https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ 
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>> And please also see the answers inline to your specific questions:
>>>> 
>>>>  
>>>> 
>>>> On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" Wakefield, MBA 
>>>> <[email protected]> wrote:
>>>> 
>>>> Silly question right? Thing is I’ve read a bit and watched some YouTube 
>>>> videos and I’m still not quite sure what I can and can’t do with Hawq. Is 
>>>> it a true database or is it like Hive where I need to use HCatalog?
>>>> 
>>>>  
>>>> 
>>>> It is a true database, you can think it is like a parallel postgres but 
>>>> with much more functionalities and it works natively in hadoop world. 
>>>> HCatalog is not necessary. But you can read data registered in HCatalog 
>>>> with the new feature "hcatalog integration".
>>>> 
>>>>  
>>>> 
>>>> Can I write data intensive applications against it using ODBC? Does it 
>>>> enforce referential integrity? Does it have stored procedures?
>>>> 
>>>>  
>>>> 
>>>> ODBC: yes, both JDBC/ODBC are supported
>>>> 
>>>> referential integrity: currently not supported.
>>>> 
>>>> Stored procedures: yes.
>>>> 
>>>>  
>>>> 
>>>> B.
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>> Please let us know if you have any other questions.
>>>> 
>>>>  
>>>> 
>>>> Cheers
>>>> 
>>>> Lei
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>>  
>>>> 
>>>> ============================================================================================================================
>>>> Disclaimer: This message and the information contained herein is 
>>>> proprietary and confidential and subject to the Tech Mahindra policy 
>>>> statement, you may review the policy at 
>>>> http://www.techmahindra.com/Disclaimer.html externally 
>>>> http://tim.techmahindra.com/tim/disclaimer.html internally within 
>>>> TechMahindra.
>>>> ============================================================================================================================
>

Re: what is Hawq?

Reply via email to