[jira] [Created] (HAWQ-1313) Integration with Spark DataFrame API
Dmitry Buzolin created HAWQ-1313: Summary: Integration with Spark DataFrame API Key: HAWQ-1313 URL: https://issues.apache.org/jira/browse/HAWQ-1313 Project: Apache HAWQ Issue Type: Improvement Reporter: Dmitry Buzolin Assignee: Ed Espino HAWQ should expose data via Spark DataFrame API. This would give Spark users ability to work with data stored in Hawq directly and expose Hawq data in Spark ML pipelines. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
RE: About Apache HAWQ home page
I can take care of this, although I am not a HAWQ developer :) Please point me to the corresponding Jira entry if any. -Original Message- From: Yandong Yao [mailto:y...@pivotal.io] Sent: Tuesday, January 17, 2017 2:51 AM To: Ed Espino; dev@hawq.incubator.apache.org Cc: Ruilong Huo; Paul Guo; Roman Shaposhnik; Lili Ma Subject: Re: About Apache HAWQ home page WARNING - External email; exercise caution +dev@hawq for broader discussion On Tue, Jan 17, 2017 at 3:41 PM, Ed Espino wrote: > Probably helpful to push this to the dev list for discussion on a new > thread. I don't have much if any context on the current layout but other > dev members might. > > I need to remind myself all the time ... we are growing a "dev community" > ... "dev community" ... "dev community" ... "dev community". Especially as > you listed "who could help to make those changes"? I believe any committer > in the community is capable of performing this work ( > https://git-wip-us.apache.org/repos/asf?p=incubator-hawq-site.git). > > -=e > > On Mon, Jan 16, 2017 at 11:23 PM, Yandong Yao wrote: > >> A few comments >> about HAWQ home page >> after talked with Tim >> who is experienced UX guy >> : >> >> 1) Remove 'More...' link and corresponding section >> 2) Move 'Doc' link to the end of the tab list >> 3) Change 'Source' in download section to a link, instead of using a big >> blue button. >> >> Who could help to make those changes? >> >> >> This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
RE: Different storage backends in HAWQ?
Created HAWQ-1270 for this. I got some feedback on this from Greenplum folks - they also find this idea very interesting. -Original Message- From: Yandong Yao [mailto:y...@pivotal.io] Sent: Thursday, January 12, 2017 8:54 PM To: dev@hawq.incubator.apache.org Subject: Re: Different storage backends in HAWQ? WARNING - External email; exercise caution Supporting Ceph will be very interesting. Could you please create a JIRA? Would be great if there is any patch. On Thu, Jan 12, 2017 at 10:19 PM, Dmitry Buzolin wrote: > Also, besides supporting 3 different storage interfaces Ceph is more > sophisticated storage backend compare to Hadoop at this time. > For example: in addition to replicated pools, Ceph supports erasure coded > pools (kind of host based RAID), which has requires lot less storage > compare to the former. > Other great features of Ceph is an algorytmic approach to map data to the > nodes rather than having centrally managed namenodes and snapshots. I don't > think HDFS offers any of these features. In terms of performance, Ceph > should be faster than HFDS since it is written on C++ and because it > doesn't have scalability limitations when mapping data to storage pools, > compare to Hadoop, where name node is such point of contention. > > > -Original Message- > From: Paul Guo [mailto:paul...@gmail.com] > Sent: Wednesday, January 11, 2017 9:15 PM > To: dev@hawq.incubator.apache.org > Subject: Re: Different storage backends in HAWQ? > > WARNING - External email; exercise caution > > > HAWQ supports row oriented format also (AO Table). For block storage & > posix fs, I suspect you could use gpdb since gpdb uses local storage though > probably a bit additional work is needed. For object storage (e.g. S3 > interface), this is an interesting topic. There have been a JIRA for this > though there are some debates about better solutions. > > HAWQ-823 Amazon S3 External Table Support > https://issues.apache.org/jira/browse/HAWQ-823 > > > 2017-01-11 23:14 GMT+08:00 Dmitry Buzolin : > > > Since HAWQ only depends on Hadoop and Parquet as a columnar format. I > > would like to propose having Hawq having pluggable storage backends. > > Hadoop is already supported but there is Ceph storage backend which > offers > > standard Posix compliant file system, object and a block storage. Ceph is > > also is location aware and written in C++. I believe this is more > important > > than porting HAWQ to windows. Your thoughts? > > > > Thanks, > > Dmitry. > > > > > > > > This message may contain confidential information and is intended for > > specific recipients unless explicitly noted otherwise. If you have reason > > to believe you are not an intended recipient of this message, please > delete > > it and notify the sender. This message may not represent the opinion of > > Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, > and > > does not constitute a contract or guarantee. Unencrypted electronic mail > is > > not secure and the recipient of this message is expected to provide > > safeguards from viruses and pursue alternate means of communication where > > privacy or a binding message is desired. > > > > > > This message may contain confidential information and is intended for > specific recipients unless explicitly noted otherwise. If you have reason > to believe you are not an intended recipient of this message, please delete > it and notify the sender. This message may not represent the opinion of > Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, and > does not constitute a contract or guarantee. Unencrypted electronic mail is > not secure and the recipient of this message is expected to provide > safeguards from viruses and pursue alternate means of communication where > privacy or a binding message is desired. > -- Best Regards, Yandong This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
RE: Different storage backends in HAWQ?
Also, besides supporting 3 different storage interfaces Ceph is more sophisticated storage backend compare to Hadoop at this time. For example: in addition to replicated pools, Ceph supports erasure coded pools (kind of host based RAID), which has requires lot less storage compare to the former. Other great features of Ceph is an algorytmic approach to map data to the nodes rather than having centrally managed namenodes and snapshots. I don't think HDFS offers any of these features. In terms of performance, Ceph should be faster than HFDS since it is written on C++ and because it doesn't have scalability limitations when mapping data to storage pools, compare to Hadoop, where name node is such point of contention. -Original Message- From: Paul Guo [mailto:paul...@gmail.com] Sent: Wednesday, January 11, 2017 9:15 PM To: dev@hawq.incubator.apache.org Subject: Re: Different storage backends in HAWQ? WARNING - External email; exercise caution HAWQ supports row oriented format also (AO Table). For block storage & posix fs, I suspect you could use gpdb since gpdb uses local storage though probably a bit additional work is needed. For object storage (e.g. S3 interface), this is an interesting topic. There have been a JIRA for this though there are some debates about better solutions. HAWQ-823 Amazon S3 External Table Support https://issues.apache.org/jira/browse/HAWQ-823 2017-01-11 23:14 GMT+08:00 Dmitry Buzolin : > Since HAWQ only depends on Hadoop and Parquet as a columnar format. I > would like to propose having Hawq having pluggable storage backends. > Hadoop is already supported but there is Ceph storage backend which offers > standard Posix compliant file system, object and a block storage. Ceph is > also is location aware and written in C++. I believe this is more important > than porting HAWQ to windows. Your thoughts? > > Thanks, > Dmitry. > > > > This message may contain confidential information and is intended for > specific recipients unless explicitly noted otherwise. If you have reason > to believe you are not an intended recipient of this message, please delete > it and notify the sender. This message may not represent the opinion of > Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, and > does not constitute a contract or guarantee. Unencrypted electronic mail is > not secure and the recipient of this message is expected to provide > safeguards from viruses and pursue alternate means of communication where > privacy or a binding message is desired. > This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.
Different storage backends in HAWQ?
Since HAWQ only depends on Hadoop and Parquet as a columnar format. I would like to propose having Hawq having pluggable storage backends. Hadoop is already supported but there is Ceph storage backend which offers standard Posix compliant file system, object and a block storage. Ceph is also is location aware and written in C++. I believe this is more important than porting HAWQ to windows. Your thoughts? Thanks, Dmitry. This message may contain confidential information and is intended for specific recipients unless explicitly noted otherwise. If you have reason to believe you are not an intended recipient of this message, please delete it and notify the sender. This message may not represent the opinion of Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a contract or guarantee. Unencrypted electronic mail is not secure and the recipient of this message is expected to provide safeguards from viruses and pursue alternate means of communication where privacy or a binding message is desired.