This patch is standard in HDFS 2.7. Pivotal HD and HDP are both based on HDFS 2.6 with the truncate patch from 2.7 backported.
> On Nov 13, 2015, at 4:45 PM, Dan Baskette <[email protected]> wrote: > > No, truncate was added to Apache Hadoop > > https://issues.apache.org/jira/plugins/servlet/mobile#issue/hdfs-3107 > > Sent from my iPhone > >> On Nov 13, 2015, at 7:39 PM, Bob Marshall <[email protected]> >> wrote: >> >> I stand corrected. But I had a question: >> >> In Pivotal Hadoop HDFS, we added truncate to support transaction. The >> signature of the truncate is as follows. void truncate(Path src, long >> length) throws IOException; The truncate() function truncates the file to >> the size which is less or equal to the file length. Ift he size of the file >> is smaller than the target length, an IOException is thrown.This is >> different from Posix truncate semantics. The rationale behind is HDFS does >> not support overwriting at any position. >> >> Does this mean I need to run a modified HDFS to run HAWQ? >> >> Robert L Marshall >> Senior Consultant | Avalon Consulting, LLC >> c: (210) 853-7041 >> LinkedIn | Google+ | Twitter >> ------------------------------------------------------------------------------------------------------------- >> This message (including any attachments) contains confidential information >> intended for a specific individual and purpose, and is protected by law. If >> you are not the intended recipient, you should delete this message. Any >> disclosure, copying, or distribution of this message, or the taking of any >> action based on it, is strictly prohibited. >> >>> On Fri, Nov 13, 2015 at 7:16 PM, Dan Baskette <[email protected]> wrote: >>> But HAWQ does manage its own storage on HDFS. You can leverage native hawq >>> format or Parquet. It's PXF functions allows the querying of files in >>> other formats. So, by your (and my) definition it is indeed a database. >>> >>> Sent from my iPhone >>> >>>> On Nov 13, 2015, at 7:08 PM, Bob Marshall <[email protected]> >>>> wrote: >>>> >>>> Chhavi Joshi is right on the money. A database is both a query execution >>>> tool and a data storage backend. HAWQ is executing against native Hadoop >>>> storage, i.e. HBase, HDFS, etc. >>>> >>>> Robert L Marshall >>>> Senior Consultant | Avalon Consulting, LLC >>>> c: (210) 853-7041 >>>> LinkedIn | Google+ | Twitter >>>> ------------------------------------------------------------------------------------------------------------- >>>> This message (including any attachments) contains confidential information >>>> intended for a specific individual and purpose, and is protected by law. >>>> If >>>> you are not the intended recipient, you should delete this message. Any >>>> disclosure, copying, or distribution of this message, or the taking of any >>>> action based on it, is strictly prohibited. >>>> >>>>> On Fri, Nov 13, 2015 at 10:41 AM, Chhavi Joshi >>>>> <[email protected]> wrote: >>>>> If you have HAWQ greenplum integration you can create the external tables >>>>> in greenplum like HIVE. >>>>> >>>>> For uploading the data into tables just need to put the file into >>>>> hdfs.(same like external tables in HIVE) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> I still believe HAWQ is only the SQL query engine not a database. >>>>> >>>>> >>>>> >>>>> Chhavi >>>>> >>>>> From: Atri Sharma [mailto:[email protected]] >>>>> Sent: Friday, November 13, 2015 3:53 AM >>>>> >>>>> >>>>> To: [email protected] >>>>> Subject: Re: what is Hawq? >>>>> >>>>> >>>>> Greenplum is open sourced. >>>>> >>>>> The main difference is between the two engines is that HAWQ is more for >>>>> Hadoop based systems whereas Greenplum is more towards regular FS. This >>>>> is a very high level difference between the two, the differences are more >>>>> detailed. But a single line difference between the two is the one I wrote. >>>>> >>>>> On 13 Nov 2015 14:20, "Adaryl "Bob" Wakefield, MBA" >>>>> <[email protected]> wrote: >>>>> >>>>> Is Greenplum free? I heard they open sourced it but I haven’t found >>>>> anything but a community edition. >>>>> >>>>> >>>>> >>>>> Adaryl "Bob" Wakefield, MBA >>>>> Principal >>>>> Mass Street Analytics, LLC >>>>> 913.938.6685 >>>>> www.linkedin.com/in/bobwakefieldmba >>>>> Twitter: @BobLovesData >>>>> >>>>> >>>>> >>>>> From: dortmont >>>>> >>>>> Sent: Friday, November 13, 2015 2:42 AM >>>>> >>>>> To: [email protected] >>>>> >>>>> Subject: Re: what is Hawq? >>>>> >>>>> >>>>> >>>>> I see the advantage of HAWQ compared to other Hadoop SQL engines. It >>>>> looks like the most mature solution on Hadoop thanks to the postgresql >>>>> based engine. >>>>> >>>>> >>>>> >>>>> But why wouldn't I use Greenplum instead of HAWQ? It has even better >>>>> performance and it supports updates. >>>>> >>>>> >>>>> Cheers >>>>> >>>>> >>>>> >>>>> 2015-11-13 7:45 GMT+01:00 Atri Sharma <[email protected]>: >>>>> >>>>> +1 for transactions. >>>>> >>>>> I think a major plus point is that HAWQ supports transactions, and this >>>>> enables a lot of critical workloads to be done on HAWQ. >>>>> >>>>> On 13 Nov 2015 12:13, "Lei Chang" <[email protected]> wrote: >>>>> >>>>> >>>>> >>>>> Like what Bob said, HAWQ is a complete database and Drill is just a query >>>>> engine. >>>>> >>>>> >>>>> >>>>> And HAWQ has also a lot of other benefits over Drill, for example: >>>>> >>>>> >>>>> >>>>> 1. SQL completeness: HAWQ is the best for the sql-on-hadoop engines, can >>>>> run all TPCDS queries without any changes. And support almost all third >>>>> party tools, such as Tableau et al. >>>>> >>>>> 2. Performance: proved the best in the hadoop world >>>>> >>>>> 3. Scalability: high scalable via high speed UDP based interconnect. >>>>> >>>>> 4. Transactions: as I know, drill does not support transactions. it is a >>>>> nightmare for end users to keep consistency. >>>>> >>>>> 5. Advanced resource management: HAWQ has the most advanced resource >>>>> management. It natively supports YARN and easy to use hierarchical >>>>> resource queues. Resources can be managed and enforced on query and >>>>> operator level. >>>>> >>>>> >>>>> >>>>> Cheers >>>>> >>>>> Lei >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Nov 13, 2015 at 9:34 AM, Adaryl "Bob" Wakefield, MBA >>>>> <[email protected]> wrote: >>>>> >>>>> There are a lot of tools that do a lot of things. Believe me it’s a full >>>>> time job keeping track of what is going on in the apache world. As I >>>>> understand it, Drill is just a query engine while Hawq is an actual >>>>> database...some what anyway. >>>>> >>>>> >>>>> >>>>> Adaryl "Bob" Wakefield, MBA >>>>> Principal >>>>> Mass Street Analytics, LLC >>>>> 913.938.6685 >>>>> www.linkedin.com/in/bobwakefieldmba >>>>> Twitter: @BobLovesData >>>>> >>>>> >>>>> >>>>> From: Will Wagner >>>>> >>>>> Sent: Thursday, November 12, 2015 7:42 AM >>>>> >>>>> To: [email protected] >>>>> >>>>> Subject: Re: what is Hawq? >>>>> >>>>> >>>>> >>>>> Hi Lie, >>>>> >>>>> Great answer. >>>>> >>>>> I have a follow up question. >>>>> Everything HAWQ is capable of doing is already covered by Apache Drill. >>>>> Why do we need another tool? >>>>> >>>>> Thank you, >>>>> Will W >>>>> >>>>> On Nov 12, 2015 12:25 AM, "Lei Chang" <[email protected]> wrote: >>>>> >>>>> >>>>> >>>>> Hi Bob, >>>>> >>>>> >>>>> >>>>> Apache HAWQ is a Hadoop native SQL query engine that combines the key >>>>> technological advantages of MPP database with the scalability and >>>>> convenience of Hadoop. HAWQ reads data from and writes data to HDFS >>>>> natively. HAWQ delivers industry-leading performance and linear >>>>> scalability. It provides users the tools to confidently and successfully >>>>> interact with petabyte range data sets. HAWQ provides users with a >>>>> complete, standards compliant SQL interface. More specifically, HAWQ has >>>>> the following features: >>>>> · On-premise or cloud deployment >>>>> >>>>> · Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP >>>>> extension >>>>> >>>>> · Extremely high performance. many times faster than other Hadoop >>>>> SQL engine. >>>>> >>>>> · World-class parallel optimizer >>>>> >>>>> · Full transaction capability and consistency guarantee: ACID >>>>> >>>>> · Dynamic data flow engine through high speed UDP based >>>>> interconnect >>>>> >>>>> · Elastic execution engine based on virtual segment & data >>>>> locality >>>>> >>>>> · Support multiple level partitioning and List/Range based >>>>> partitioned tables. >>>>> >>>>> · Multiple compression method support: snappy, gzip, quicklz, RLE >>>>> >>>>> · Multi-language user defined function support: python, perl, >>>>> java, c/c++, R >>>>> >>>>> · Advanced machine learning and data mining functionalities >>>>> through MADLib >>>>> >>>>> · Dynamic node expansion: in seconds >>>>> >>>>> · Most advanced three level resource management: Integrate with >>>>> YARN and hierarchical resource queues. >>>>> >>>>> · Easy access of all HDFS data and external system data (for >>>>> example, HBase) >>>>> >>>>> · Hadoop Native: from storage (HDFS), resource management (YARN) >>>>> to deployment (Ambari). >>>>> >>>>> · Authentication & Granular authorization: Kerberos, SSL and role >>>>> based access >>>>> >>>>> · Advanced C/C++ access library to HDFS and YARN: libhdfs3 & >>>>> libYARN >>>>> >>>>> · Support most third party tools: Tableau, SAS et al. >>>>> >>>>> · Standard connectivity: JDBC/ODBC >>>>> >>>>> >>>>> >>>>> And the link here can give you more information around hawq: >>>>> https://cwiki.apache.org/confluence/display/HAWQ/About+HAWQ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> And please also see the answers inline to your specific questions: >>>>> >>>>> >>>>> >>>>> On Thu, Nov 12, 2015 at 4:09 PM, Adaryl "Bob" Wakefield, MBA >>>>> <[email protected]> wrote: >>>>> >>>>> Silly question right? Thing is I’ve read a bit and watched some YouTube >>>>> videos and I’m still not quite sure what I can and can’t do with Hawq. Is >>>>> it a true database or is it like Hive where I need to use HCatalog? >>>>> >>>>> >>>>> >>>>> It is a true database, you can think it is like a parallel postgres but >>>>> with much more functionalities and it works natively in hadoop world. >>>>> HCatalog is not necessary. But you can read data registered in HCatalog >>>>> with the new feature "hcatalog integration". >>>>> >>>>> >>>>> >>>>> Can I write data intensive applications against it using ODBC? Does it >>>>> enforce referential integrity? Does it have stored procedures? >>>>> >>>>> >>>>> >>>>> ODBC: yes, both JDBC/ODBC are supported >>>>> >>>>> referential integrity: currently not supported. >>>>> >>>>> Stored procedures: yes. >>>>> >>>>> >>>>> >>>>> B. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Please let us know if you have any other questions. >>>>> >>>>> >>>>> >>>>> Cheers >>>>> >>>>> Lei >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ============================================================================================================================ >>>>> Disclaimer: This message and the information contained herein is >>>>> proprietary and confidential and subject to the Tech Mahindra policy >>>>> statement, you may review the policy at >>>>> http://www.techmahindra.com/Disclaimer.html externally >>>>> http://tim.techmahindra.com/tim/disclaimer.html internally within >>>>> TechMahindra. >>>>> ============================================================================================================================ >>
