Re: [jira] [Created] (HAWQ-1591) Common tuple batch structure for vectorized execution

2018-02-26 Thread Hongxu Ma
Thank you very much! Lirong

在 27/02/2018 11:31, Ivan Weng 写道:
> Thanks Lirong. It's a good suggestion. Will evaluate the benefit and cost
> before making decision.
>
> Also I think it's better to make the tuple batch interfaces more general
> and its internal format could be replaced if possible.
>
>
> Regards,
> Ivan
>
> On Mon, Feb 26, 2018 at 6:59 PM, Lirong Jian  wrote:
>
>> Have you guy consider using Apache Arrow format (http://arrow.apache.org/)
>> as the in-memory tuple batch structure for vectorized execution? I think
>> the goal of Apache Arrow project matches the one of vectorized execution
>> perfectly, and its community is quite active, although the implementation
>> of Apache Arrow format is far away from the implementation of
>> PostgreSQL/GPDB/HAWQ tuple structure , which means the engineering efforts
>> needed would be huge.
>>
>> Just my two cents.
>>
>> Lirong
>>
>> Lirong Jian
>> HashData Inc.
>>
>> 2018-02-26 13:40 GMT+08:00 Hongxu Ma (JIRA) :
>>
>>> Hongxu Ma created HAWQ-1591:
>>> ---
>>>
>>>   Summary: Common tuple batch structure for vectorized
>> execution
>>>   Key: HAWQ-1591
>>>   URL: https://issues.apache.org/jira/browse/HAWQ-1591
>>>   Project: Apache HAWQ
>>>Issue Type: Sub-task
>>>Components: Query Execution
>>>  Reporter: Hongxu Ma
>>>  Assignee: Lei Chang
>>>   Fix For: backlog
>>>
>>>
>>> A common tuple batch structure for vectorized execution, holds the tuples
>>> which be transfered between vectorized operators.
>>>
>>>
>>>
>>>
>>>
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v7.6.3#76005)
>>>

-- 
Regards,
Hongxu.



[VOTE]: Apache HAWQ 2.3.0.0-incubating Release (RC2)

2018-02-26 Thread Yi JIN
Hi All,

This is the vote for Apache HAWQ (incubating) 2.3.0.0-incubating Release
Candidate 2 (RC2). It is a source release for HAWQ core, PXF, and Ranger;
and binary release for HAWQ core,  PXF and Ranger. We have rpm package
involved for the binary release.

The vote will run for at least 72 hours and will close on Saturday, March
3rd, 2017. Thanks.

1. Wiki page of the release:
*https://cwiki.apache.org/confluence/display/HAWQ/Apache+HAWQ+2.3.0.0-incubating+Release
*


2. Release Notes (Apache Jira generated):
https://issues.apache.org/jira/secure/ReleaseNote.jspa?
version=12340262=Html=12318826


3. Release verification steps can be found at:
For source tarball: https://cwiki.apache.org/confluence/display/HAWQ/
Release+Process%3A+Step+by+step+guide#ReleaseProcess:
Stepbystepguide-ValidatetheReleaseCandidate
For rpm package: https://cwiki.apache.org/confluence/display/HAWQ/
Build+Package+and+Install+with+RPM


4. Git release branch:
https://git-wip-us.apache.org/repos/asf?p=incubator-hawq.
git;a=shortlog;h=refs/heads/2.3.0.0-incubating

5. Source and Binary release balls with signare:
https://dist.apache.org/repos/dist/dev/incubator/hawq/2.3.0.
0-incubating.RC2/


6. Keys to verify the signature of the release artifact are available at:
https://dist.apache.org/repos/dist/dev/incubator/hawq/KEYS


7. The artifact(s) has been signed with Key ID: CE60F90D1333092A

8. Fixed issues in RC2.
https://issues.apache.org/jira/browse/HAWQ-1589
https://issues.apache.org/jira/browse/HAWQ-1590

REMINDER: Please provide details of what you have tried and verified before
your vote conclusion. Thanks!


Please vote accordingly:
[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)


Best regards,
Yi (yjin)


Re: [jira] [Created] (HAWQ-1591) Common tuple batch structure for vectorized execution

2018-02-26 Thread Ivan Weng
Thanks Lirong. It's a good suggestion. Will evaluate the benefit and cost
before making decision.

Also I think it's better to make the tuple batch interfaces more general
and its internal format could be replaced if possible.


Regards,
Ivan

On Mon, Feb 26, 2018 at 6:59 PM, Lirong Jian  wrote:

> Have you guy consider using Apache Arrow format (http://arrow.apache.org/)
> as the in-memory tuple batch structure for vectorized execution? I think
> the goal of Apache Arrow project matches the one of vectorized execution
> perfectly, and its community is quite active, although the implementation
> of Apache Arrow format is far away from the implementation of
> PostgreSQL/GPDB/HAWQ tuple structure , which means the engineering efforts
> needed would be huge.
>
> Just my two cents.
>
> Lirong
>
> Lirong Jian
> HashData Inc.
>
> 2018-02-26 13:40 GMT+08:00 Hongxu Ma (JIRA) :
>
> > Hongxu Ma created HAWQ-1591:
> > ---
> >
> >  Summary: Common tuple batch structure for vectorized
> execution
> >  Key: HAWQ-1591
> >  URL: https://issues.apache.org/jira/browse/HAWQ-1591
> >  Project: Apache HAWQ
> >   Issue Type: Sub-task
> >   Components: Query Execution
> > Reporter: Hongxu Ma
> > Assignee: Lei Chang
> >  Fix For: backlog
> >
> >
> > A common tuple batch structure for vectorized execution, holds the tuples
> > which be transfered between vectorized operators.
> >
> >
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v7.6.3#76005)
> >
>


Re: a vectorized execution design document

2018-02-26 Thread Shujie Zhang
Hi,

We check the plan node to see if it can be vectorized when the Plan has
been generated,

In this phase, the only cheapest Plan had been selected, so we have no
chance to change it.


If we want to generate the vectorized Plan in the optimizer, we should
generate

 the vectorized Path and compute the cost of it, then we can compare with
both the cost of them

and choose the cheaper one, the trouble is both build-in-optimizer and ORCA
should

be refactored, it is a complex work:).  Another trouble is that the
solution space of optimizer

would become larger becuase of adding a new type Path, the planning time
should be controlled.


In this design, we change the Plan after it was generated,  it is
transparent to upper modules,

so the optimizer is also can be changed to fit the current vectorized Plan
in the future.

Thanks,
Zhang Shujie

On Mon, Feb 26, 2018 at 3:01 PM, 刘奎恩(局外)  wrote:

> Nice doc, clear design. It is a good start ! I saw an example
> on aggregation is illustrated during the doc, we may implement more
> operators with this design, for example, SORT, JOIN.
> One question is: we implement vectorization under plan three, that is, the
> optimizer cannot feel the change in this way, it still estimates overall
> cost like
> ' total_cost = startup_cost + cpu_per_tuple * tuples + seq_page_cost *
> pages 'In my opinion, the second part (CPU costs) changes a lot, so it is
> should be a stage design, any further plan on it?
> -——
> Kuien Liu/奎恩
> --发件人:Shujie
> Zhang 发送时间:2018年2月9日(星期五) 16:35收件人:dev <
> dev@hawq.incubator.apache.org>主 题:a vectorized execution design document
> Hi,
>
> A vectorized execution design document have been uploaded
> to the issue#1450:
> https://issues.apache.org/jira/browse/HAWQ-1450
>
> Inside the document are a lot of ideas about how to implement a vectorized
> executor, We welcome any comments on the content and suggestions for
> improvement, thanks.
>
> Zhang Shujie
> 2018-02-09
>
>


Re: [jira] [Created] (HAWQ-1591) Common tuple batch structure for vectorized execution

2018-02-26 Thread Lirong Jian
Have you guy consider using Apache Arrow format (http://arrow.apache.org/)
as the in-memory tuple batch structure for vectorized execution? I think
the goal of Apache Arrow project matches the one of vectorized execution
perfectly, and its community is quite active, although the implementation
of Apache Arrow format is far away from the implementation of
PostgreSQL/GPDB/HAWQ tuple structure , which means the engineering efforts
needed would be huge.

Just my two cents.

Lirong

Lirong Jian
HashData Inc.

2018-02-26 13:40 GMT+08:00 Hongxu Ma (JIRA) :

> Hongxu Ma created HAWQ-1591:
> ---
>
>  Summary: Common tuple batch structure for vectorized execution
>  Key: HAWQ-1591
>  URL: https://issues.apache.org/jira/browse/HAWQ-1591
>  Project: Apache HAWQ
>   Issue Type: Sub-task
>   Components: Query Execution
> Reporter: Hongxu Ma
> Assignee: Lei Chang
>  Fix For: backlog
>
>
> A common tuple batch structure for vectorized execution, holds the tuples
> which be transfered between vectorized operators.
>
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>