[jira] [Created] (HAWQ-1313) Integration with Spark DataFrame API

2017-02-05 Thread Dmitry Buzolin (JIRA)
Dmitry Buzolin created HAWQ-1313:


 Summary: Integration with Spark DataFrame API
 Key: HAWQ-1313
 URL: https://issues.apache.org/jira/browse/HAWQ-1313
 Project: Apache HAWQ
  Issue Type: Improvement
Reporter: Dmitry Buzolin
Assignee: Ed Espino


HAWQ should expose data via Spark DataFrame API. This would give Spark users 
ability to work with data stored in Hawq directly and expose Hawq data in Spark 
ML pipelines.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


RE: About Apache HAWQ home page

2017-01-17 Thread Dmitry Buzolin
I can take care of this, although I am not a HAWQ developer :)
Please point me to the corresponding Jira entry if any.

-Original Message-
From: Yandong Yao [mailto:y...@pivotal.io]
Sent: Tuesday, January 17, 2017 2:51 AM
To: Ed Espino; dev@hawq.incubator.apache.org
Cc: Ruilong Huo; Paul Guo; Roman Shaposhnik; Lili Ma
Subject: Re: About Apache HAWQ home page

WARNING - External email; exercise caution


+dev@hawq for broader discussion

On Tue, Jan 17, 2017 at 3:41 PM, Ed Espino  wrote:

> Probably helpful to push this to the dev list for discussion on a new
> thread.  I don't have much if any context on the current layout but other
> dev members might.
>
> I need to remind myself all the time ... we are growing a "dev community"
> ... "dev community" ... "dev community" ... "dev community".  Especially as
> you listed "who could help to make those changes"?  I believe any committer
> in the community is capable of performing this work (
> https://git-wip-us.apache.org/repos/asf?p=incubator-hawq-site.git).
>
> -=e
>
> On Mon, Jan 16, 2017 at 11:23 PM, Yandong Yao  wrote:
>
>> A few comments
>> ​about HAWQ home page ​
>> after talked with Tim
>> ​who is experienced UX guy
>> :
>>
>> 1) Remove 'More...' link and corresponding section
>> 2) Move 'Doc' link to the end of the tab list
>> 3) Change 'Source' in download section to a link, instead of using a big
>> blue button.
>>
>> Who could help to make those changes?
>>
>>
>>



This message may contain confidential information and is intended for specific 
recipients unless explicitly noted otherwise. If you have reason to believe you 
are not an intended recipient of this message, please delete it and notify the 
sender. This message may not represent the opinion of Intercontinental 
Exchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a 
contract or guarantee. Unencrypted electronic mail is not secure and the 
recipient of this message is expected to provide safeguards from viruses and 
pursue alternate means of communication where privacy or a binding message is 
desired.


RE: Different storage backends in HAWQ?

2017-01-13 Thread Dmitry Buzolin
Created HAWQ-1270 for this. I got some feedback on this from Greenplum folks - 
they also find this idea very interesting.

-Original Message-
From: Yandong Yao [mailto:y...@pivotal.io]
Sent: Thursday, January 12, 2017 8:54 PM
To: dev@hawq.incubator.apache.org
Subject: Re: Different storage backends in HAWQ?

WARNING - External email; exercise caution


Supporting Ceph will be very interesting. Could you please create a JIRA?
Would be great if there is any patch.

On Thu, Jan 12, 2017 at 10:19 PM, Dmitry Buzolin 
wrote:

> Also, besides supporting 3 different storage interfaces Ceph is more
> sophisticated storage backend compare to Hadoop at this time.
> For example: in addition to replicated pools, Ceph supports erasure coded
> pools  (kind of host based RAID), which has requires lot less storage
> compare to the former.
> Other great features of Ceph is an algorytmic approach to map data to the
> nodes rather than having centrally managed namenodes and snapshots. I don't
> think HDFS offers any of these features. In terms of performance, Ceph
> should be faster than HFDS since it is written on C++ and because it
> doesn't have scalability limitations when mapping data to storage pools,
> compare to Hadoop, where name node is such point of contention.
>
>
> -Original Message-
> From: Paul Guo [mailto:paul...@gmail.com]
> Sent: Wednesday, January 11, 2017 9:15 PM
> To: dev@hawq.incubator.apache.org
> Subject: Re: Different storage backends in HAWQ?
>
> WARNING - External email; exercise caution
>
>
> HAWQ supports row oriented format also (AO Table). For block storage &
> posix fs, I suspect you could use gpdb since gpdb uses local storage though
> probably a bit additional work is needed. For object storage (e.g. S3
> interface), this is an interesting topic. There have been a JIRA for this
> though there are some debates about better solutions.
>
> HAWQ-823 Amazon S3 External Table Support
> https://issues.apache.org/jira/browse/HAWQ-823
>
>
> 2017-01-11 23:14 GMT+08:00 Dmitry Buzolin :
>
> > Since HAWQ only depends on Hadoop and Parquet as a columnar format. I
> > would like to propose having Hawq having pluggable storage backends.
> > Hadoop is already supported but there is Ceph storage backend which
> offers
> > standard Posix compliant file system, object and a block storage. Ceph is
> > also is location aware and written in C++. I believe this is more
> important
> > than porting HAWQ to windows. Your thoughts?
> >
> > Thanks,
> > Dmitry.
> >
> > 
> >
> > This message may contain confidential information and is intended for
> > specific recipients unless explicitly noted otherwise. If you have reason
> > to believe you are not an intended recipient of this message, please
> delete
> > it and notify the sender. This message may not represent the opinion of
> > Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates,
> and
> > does not constitute a contract or guarantee. Unencrypted electronic mail
> is
> > not secure and the recipient of this message is expected to provide
> > safeguards from viruses and pursue alternate means of communication where
> > privacy or a binding message is desired.
> >
>
> 
>
> This message may contain confidential information and is intended for
> specific recipients unless explicitly noted otherwise. If you have reason
> to believe you are not an intended recipient of this message, please delete
> it and notify the sender. This message may not represent the opinion of
> Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, and
> does not constitute a contract or guarantee. Unencrypted electronic mail is
> not secure and the recipient of this message is expected to provide
> safeguards from viruses and pursue alternate means of communication where
> privacy or a binding message is desired.
>



--
Best Regards,
Yandong



This message may contain confidential information and is intended for specific 
recipients unless explicitly noted otherwise. If you have reason to believe you 
are not an intended recipient of this message, please delete it and notify the 
sender. This message may not represent the opinion of Intercontinental 
Exchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a 
contract or guarantee. Unencrypted electronic mail is not secure and the 
recipient of this message is expected to provide safeguards from viruses and 
pursue alternate means of communication where privacy or a binding message is 
desired.


RE: Different storage backends in HAWQ?

2017-01-12 Thread Dmitry Buzolin
Also, besides supporting 3 different storage interfaces Ceph is more 
sophisticated storage backend compare to Hadoop at this time.
For example: in addition to replicated pools, Ceph supports erasure coded pools 
 (kind of host based RAID), which has requires lot less storage compare to the 
former.
Other great features of Ceph is an algorytmic approach to map data to the nodes 
rather than having centrally managed namenodes and snapshots. I don't think 
HDFS offers any of these features. In terms of performance, Ceph should be 
faster than HFDS since it is written on C++ and because it doesn't have 
scalability limitations when mapping data to storage pools, compare to Hadoop, 
where name node is such point of contention.


-Original Message-
From: Paul Guo [mailto:paul...@gmail.com]
Sent: Wednesday, January 11, 2017 9:15 PM
To: dev@hawq.incubator.apache.org
Subject: Re: Different storage backends in HAWQ?

WARNING - External email; exercise caution


HAWQ supports row oriented format also (AO Table). For block storage &
posix fs, I suspect you could use gpdb since gpdb uses local storage though
probably a bit additional work is needed. For object storage (e.g. S3
interface), this is an interesting topic. There have been a JIRA for this
though there are some debates about better solutions.

HAWQ-823 Amazon S3 External Table Support
https://issues.apache.org/jira/browse/HAWQ-823


2017-01-11 23:14 GMT+08:00 Dmitry Buzolin :

> Since HAWQ only depends on Hadoop and Parquet as a columnar format. I
> would like to propose having Hawq having pluggable storage backends.
> Hadoop is already supported but there is Ceph storage backend which offers
> standard Posix compliant file system, object and a block storage. Ceph is
> also is location aware and written in C++. I believe this is more important
> than porting HAWQ to windows. Your thoughts?
>
> Thanks,
> Dmitry.
>
> 
>
> This message may contain confidential information and is intended for
> specific recipients unless explicitly noted otherwise. If you have reason
> to believe you are not an intended recipient of this message, please delete
> it and notify the sender. This message may not represent the opinion of
> Intercontinental Exchange, Inc. (ICE), its subsidiaries or affiliates, and
> does not constitute a contract or guarantee. Unencrypted electronic mail is
> not secure and the recipient of this message is expected to provide
> safeguards from viruses and pursue alternate means of communication where
> privacy or a binding message is desired.
>



This message may contain confidential information and is intended for specific 
recipients unless explicitly noted otherwise. If you have reason to believe you 
are not an intended recipient of this message, please delete it and notify the 
sender. This message may not represent the opinion of Intercontinental 
Exchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a 
contract or guarantee. Unencrypted electronic mail is not secure and the 
recipient of this message is expected to provide safeguards from viruses and 
pursue alternate means of communication where privacy or a binding message is 
desired.


Different storage backends in HAWQ?

2017-01-11 Thread Dmitry Buzolin
Since HAWQ only depends on Hadoop and Parquet as a columnar format. I would 
like to propose having Hawq having pluggable storage backends.
Hadoop is already supported but there is Ceph storage backend which offers 
standard Posix compliant file system, object and a block storage. Ceph is also 
is location aware and written in C++. I believe this is more important than 
porting HAWQ to windows. Your thoughts?

Thanks,
Dmitry.



This message may contain confidential information and is intended for specific 
recipients unless explicitly noted otherwise. If you have reason to believe you 
are not an intended recipient of this message, please delete it and notify the 
sender. This message may not represent the opinion of Intercontinental 
Exchange, Inc. (ICE), its subsidiaries or affiliates, and does not constitute a 
contract or guarantee. Unencrypted electronic mail is not secure and the 
recipient of this message is expected to provide safeguards from viruses and 
pursue alternate means of communication where privacy or a binding message is 
desired.