Re: what is Hawq?

Konstantin Boudnik Fri, 13 Nov 2015 16:46:22 -0800

On Fri, Nov 13, 2015 at 07:39PM, Bob Marshall wrote:
> I stand corrected. But I had a question:
> 
> In Pivotal Hadoop HDFS, we added truncate to support transaction. The


Not to be picky, but truncate was added to the standard HDFS starting
from 2.7 (HDFS-3107).  Perhaps has been backported by Pivotal later on? :)

> signature of the truncate is as follows. void truncate(Path src, long
> length) throws IOException; The truncate() function truncates the file to
> the size which is less or equal to the file length. Ift he size of the file
> is smaller than the target length, an IOException is thrown.This is
> different from Posix truncate semantics. The rationale behind is HDFS does
> not support overwriting at any position.
> 
> Does this mean I need to run a modified HDFS to run HAWQ?
> 
> Robert L Marshall
> Senior Consultant | Avalon Consulting, LLC
> <http://www.avalonconsult.com/>c: (210) 853-7041
> LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
> <http://www.google.com/+AvalonConsultingLLC> | Twitter
> <https://twitter.com/avalonconsult>
> -------------------------------------------------------------------------------------------------------------
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law. If
> you are not the intended recipient, you should delete this message. Any
> disclosure, copying, or distribution of this message, or the taking of any
> action based on it, is strictly prohibited.
> 
> On Fri, Nov 13, 2015 at 7:16 PM, Dan Baskette <[email protected]> wrote:
> 
> > But HAWQ does manage its own storage on HDFS.  You can leverage native
> > hawq format or Parquet.  It's PXF functions allows the querying of files in
> > other formats.   So, by your (and my) definition it is indeed a database.
> >
> > Sent from my iPhone
> >
> > On Nov 13, 2015, at 7:08 PM, Bob Marshall <[email protected]>
> > wrote:
> >
> > Chhavi Joshi is right on the money. A database is both a query execution
> > tool and a data storage backend. HAWQ is executing against native Hadoop
> > storage, i.e. HBase, HDFS, etc.
> >
> > Robert L Marshall
> > Senior Consultant | Avalon Consulting, LLC
> > <http://www.avalonconsult.com/>c: (210) 853-7041
> > LinkedIn <http://www.linkedin.com/company/avalon-consulting-llc> | Google+
> > <http://www.google.com/+AvalonConsultingLLC> | Twitter
> > <https://twitter.com/avalonconsult>
> >
> > -------------------------------------------------------------------------------------------------------------
> > This message (including any attachments) contains confidential information
> > intended for a specific individual and purpose, and is protected by law.
> > If
> > you are not the intended recipient, you should delete this message. Any
> > disclosure, copying, or distribution of this message, or the taking of any
> > action based on it, is strictly prohibited.
> >
> > On Fri, Nov 13, 2015 at 10:41 AM, Chhavi Joshi <
> > [email protected]> wrote:
> >
> >> If you have HAWQ greenplum integration you can create the external tables
> >> in greenplum like HIVE.
> >>
> >> For uploading the data into tables just need to put the file into
> >> hdfs.(same like external tables in HIVE)
> >>
> >>
> >>
> >>
> >>
> >> I still believe HAWQ is only the SQL query engine not a database.
> >>
> >>
> >>
> >> Chhavi
> >>
> >> *From:* Atri Sharma [mailto:[email protected]]
> >> *Sent:* Friday, November 13, 2015 3:53 AM
> >>
> >> *To:* [email protected]
> >> *Subject:* Re: what is Hawq?
> >>
> >>
> >>
> >> Greenplum is open sourced.
> >>
> >> The main difference is between the two engines is that HAWQ is more for
> >> Hadoop based systems whereas Greenplum is more towards regular FS. This is
> >> a very high level difference between the two, the differences are more
> >> detailed. But a single line difference between the two is the one I wrote.
> >>
> >> On 13 Nov 2015 14:20, "Adaryl "Bob" Wakefield, MBA" <
> >> [email protected]> wrote:
> >>
> >> Is Greenplum free? I heard they open sourced it but I haven’t found
> >> anything but a community edition.
> >>
> >>
> >>
> >> Adaryl "Bob" Wakefield, MBA
> >> Principal
> >> Mass Street Analytics, LLC
> >> 913.938.6685
> >> www.linkedin.com/in/bobwakefieldmba
> >> Twitter: @BobLovesData
> >>
> >>
> >>
> >> *From:* dortmont <[email protected]>
> >>
> >> *Sent:* Friday, November 13, 2015 2:42 AM
> >>
> >> *To:* [email protected]
> >>
> >> *Subject:* Re: what is Hawq?
> >>
> >>
> >>
> >> I see the advantage of HAWQ compared to other Hadoop SQL engines. It
> >> looks like the most mature solution on Hadoop thanks to the postgresql
> >> based engine.
> >>
> >>
> >>
> >> But why wouldn't I use Greenplum instead of HAWQ? It has even better
> >> performance and it supports updates.
> >>
> >>
> >> Cheers
> >>
> >>
> >>
> >> 2015-11-13 7:45 GMT+01:00 Atri Sharma <[email protected]>:
> >>
> >> +1 for transactions.
> >>
> >> I think a major plus point is that HAWQ supports transactions,  and this
> >> enables a lot of critical workloads to be done on HAWQ.
> >>
> >> On 13 Nov 2015 12:13, "Lei Chang" <[email protected]> wrote:
> >>
> >>
> >>
> >> Like what Bob said, HAWQ is a complete database and Drill is just a query
> >> engine.
> >>
> >>
> >>
> >> And HAWQ has also a lot of other benefits over Drill, for example:
> >>
> >>
> >>
> >> 1. SQL completeness: HAWQ is the best for the sql-on-hadoop engines, can
> >> run all TPCDS queries without any changes. And support almost all third
> >> party tools, such as Tableau et al.
> >>
> >> 2. Performance: proved the best in the hadoop world
> >>
> >> 3. Scalability: high scalable via high speed UDP based interconnect.
> >>
> >> 4. Transactions: as I know, drill does not support transactions. it is a
> >> nightmare for end users to keep consistency.
> >>
> >> 5. Advanced resource management: HAWQ has the most advanced resource
> >> management. It natively supports YARN and easy to use hierarchical resource
> >> queues. Resources can be managed and enforced on query and operator level.
> >>
> >>
> >>
> >> Cheers
> >>
> >> Lei
> >>
> >>
> >>
> >>
> >>
> >> On Fri, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" Wakefield, MBA <
> >> [email protected]> wrote:
> >>
> >> There are a lot of tools that do a lot of things. Believe me it’s a full
> >> time job keeping track of what is going on in the apache world. As I
> >> understand it, Drill is just a query engine while Hawq is an actual
> >> database...some what anyway.
> >>
> >>
> >>
> >> Adaryl "Bob" Wakefield, MBA
> >> Principal
> >> Mass Street Analytics, LLC
> >> 913.938.6685
> >> www.linkedin.com/in/bobwakefieldmba
> >> Twitter: @BobLovesData
> >>
> >>
> >>
> >> *From:* Will Wagner <[email protected]>
> >>
> >> *Sent:* Thursday, November 12, 2015 7:42 AM
> >>
> >> *To:* [email protected]
> >>
> >> *Subject:* Re: what is Hawq?
> >>
> >>
> >>
> >> Hi Lie,
> >>
> >> Great answer.
> >>
> >> I have a follow up question.
> >> Everything HAWQ is capable of doing is already covered by Apache Drill.
> >> Why do we need another tool?
> >>
> >> Thank you,
> >> Will W
> >>
> >> On Nov 12, 2015 12:25 AM, "Lei Chang" <[email protected]> wrote:
> >>
> >>
> >>
> >> Hi Bob,
> >>
> >>
> >>
> >> Apache HAWQ is a Hadoop native SQL query engine that combines the key
> >> technological advantages of MPP database with the scalability and
> >> convenience of Hadoop. HAWQ reads data from and writes data to HDFS
> >> natively. HAWQ delivers industry-leading performance and linear
> >> scalability. It provides users the tools to confidently and successfully
> >> interact with petabyte range data sets. HAWQ provides users with a
> >> complete, standards compliant SQL interface. More specifically, HAWQ has
> >> the following features:
> >>
> >> ·         On-premise or cloud deployment
> >>
> >> ·         Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP
> >> extension
> >>
> >> ·         Extremely high performance. many times faster than other
> >> Hadoop SQL engine.
> >>
> >> ·         World-class parallel optimizer
> >>
> >> ·         Full transaction capability and consistency guarantee: ACID
> >>
> >> ·         Dynamic data flow engine through high speed UDP based
> >> interconnect
> >>
> >> ·         Elastic execution engine based on virtual segment & data
> >> locality
> >>
> >> ·         Support multiple level partitioning and List/Range based
> >> partitioned tables.
> >>
> >> ·         Multiple compression method support: snappy, gzip, quicklz,
> >> RLE
> >>
> >> ·         Multi-language user defined function support: python, perl,
> >> java, c/c++, R
> >>
> >> ·         Advanced machine learning and data mining functionalities
> >> through MADLib
> >>
> >> ·         Dynamic node expansion: in seconds
> >>
> >> ·         Most advanced three level resource management: Integrate with
> >> YARN and hierarchical resource queues.
> >>
> >> ·         Easy access of all HDFS data and external system data (for
> >> example, HBase)
> >>
> >> ·         Hadoop Native: from storage (HDFS), resource management (YARN)
> >> to deployment (Ambari).
> >>
> >> ·         Authentication & Granular authorization: Kerberos, SSL and
> >> role based access
> >>
> >> ·         Advanced C/C++ access library to HDFS and YARN: libhdfs3 &
> >> libYARN
> >>
> >> ·         Support most third party tools: Tableau, SAS et al.
> >>
> >> ·         Standard connectivity: JDBC/ODBC
> >>
> >>
> >>
> >> And the link here can give you more information around hawq:
> >> https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ
> >>
> >>
> >>
> >>
> >>
> >> And please also see the answers inline to your specific questions:
> >>
> >>
> >>
> >> On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" Wakefield, MBA <
> >> [email protected]> wrote:
> >>
> >> Silly question right? Thing is I’ve read a bit and watched some YouTube
> >> videos and I’m still not quite sure what I can and can’t do with Hawq. Is
> >> it a true database or is it like Hive where I need to use HCatalog?
> >>
> >>
> >>
> >> It is a true database, you can think it is like a parallel postgres but
> >> with much more functionalities and it works natively in hadoop world.
> >> HCatalog is not necessary. But you can read data registered in HCatalog
> >> with the new feature "hcatalog integration".
> >>
> >>
> >>
> >> Can I write data intensive applications against it using ODBC? Does it
> >> enforce referential integrity? Does it have stored procedures?
> >>
> >>
> >>
> >> ODBC: yes, both JDBC/ODBC are supported
> >>
> >> referential integrity: currently not supported.
> >>
> >> Stored procedures: yes.
> >>
> >>
> >>
> >> B.
> >>
> >>
> >>
> >>
> >>
> >> Please let us know if you have any other questions.
> >>
> >>
> >>
> >> Cheers
> >>
> >> Lei
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> ------------------------------
> >>
> >> ============================================================================================================================
> >> Disclaimer: This message and the information contained herein is
> >> proprietary and confidential and subject to the Tech Mahindra policy
> >> statement, you may review the policy at
> >> http://www.techmahindra.com/Disclaimer.html externally
> >> http://tim.techmahindra.com/tim/disclaimer.html internally within
> >> TechMahindra.
> >>
> >> ============================================================================================================================
> >>
> >>
> >

signature.asc
Description: Digital signature

Re: what is Hawq?

Reply via email to