Re: Replace git submodule with git clone + file with commit number?

2016-07-13 Thread Paul Guo
Roman, do you still have any concern about this? Thanks.

2016-07-08 2:04 GMT+08:00 Roman Shaposhnik :

> On Thu, Jul 7, 2016 at 3:20 AM, Paul Guo  wrote:
> > For gporca it is ok to pre-build them and pass orca installation path to
> > hawq, but for
> > pgcrypto and plr, having a script to run before building hawq seems to
> not
> > be a good
> > idea, technically speaking.
> >
> > plr/pgcrypto depends on the configure options and configure checking.
> > (e.g. with and without openssl option in configure, pgcrypto build
> results
> > will be different).
> >
> > That means building of these features are not 100% independent on
> building
> > of hawq.
>
> The above makes sense, but there's way too many ways to interpret the
> particulars of it. Before we move ahead, how about I take a look at the
> branch that is being cut (see the other thread) and provide you more
> technical feedback?
>
> Thanks,
> Roman.
>


[jira] [Created] (HAWQ-924) Refactor feature test for querycontext with new test framework

2016-07-13 Thread zhenglin tao (JIRA)
zhenglin tao created HAWQ-924:
-

 Summary: Refactor feature test for querycontext with new test 
framework
 Key: HAWQ-924
 URL: https://issues.apache.org/jira/browse/HAWQ-924
 Project: Apache HAWQ
  Issue Type: Sub-task
  Components: Tests
Reporter: zhenglin tao
Assignee: Jiali Yao






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [Propose] More data skipping technology for IO intensive performance enhancement

2016-07-13 Thread Lei Chang
looks we get consensus on this enhancement, i just started a JIRA to track
this: https://issues.apache.org/jira/browse/HAWQ-923

and I also added this to performance enhancement on roadmap page

Cheers
Lei


On Mon, Jul 11, 2016 at 2:00 PM, Ming Li  wrote:

> It seems the Dynamic partition pruning in impala is different from the DPE
> (dynamic partition elimination) in HAWQ, below is the feature description
> from impala roadmap (http://impala.io/overview.html).
>
>
>- Dynamic partition pruning - to perform data elimination of queries
>where the partition filters are in dimension tables instead of the fact
>tables
>
>
> On Fri, Jul 8, 2016 at 9:56 PM, Ruilong Huo  wrote:
>
> > Strong agree with Ming's proposal.
> >
> > We do have DPE (dynamic partition elimination) in HAWQ. But it is a kind
> of
> > high level skipping which is conducted at planning phase.
> > If fine-grained filter can be done at runtime in execution phase, there
> > might be more performance gain for I/O intensive workload.
> >
> > Looking forward to see a plan for it soon:)
> >
> > Best regards,
> > Ruilong Huo
> >
> > On Fri, Jul 8, 2016 at 7:02 AM, Ivan Weng  wrote:
> >
> > > Thanks Ming, data skipping technology is really what HAWQ needed.
> > > Hope to see this design and maybe prototype soon.
> > >
> > > On Thu, Jul 7, 2016 at 10:33 AM, Wen Lin  wrote:
> > >
> > > > Thanks for sharing with us!
> > > > It's really a good investigation and proposal.
> > > > Iooking forward to a design draft.
> > > >
> > > > On Thu, Jul 7, 2016 at 10:16 AM, Lili Ma  wrote:
> > > >
> > > > > What about we work out a draft design describing how to implement
> > data
> > > > > skipping technology for HAWQ?
> > > > >
> > > > >
> > > > > Thanks
> > > > > Lili
> > > > >
> > > > > On Wed, Jul 6, 2016 at 7:23 PM, Gmail 
> wrote:
> > > > >
> > > > > > BTW, could you create some related issues in JIRA?
> > > > > >
> > > > > > Thanks
> > > > > > xunzhang
> > > > > >
> > > > > > Send from my iPhone
> > > > > >
> > > > > > > 在 2016年7月2日,23:19,Ming Li  写道:
> > > > > > >
> > > > > > > Data skipping technology can extremely avoiding unnecessary IO,
> > so
> > > > it
> > > > > > can
> > > > > > > extremely enhance performance for IO intensive query. Including
> > > > > > eliminating
> > > > > > > query on unnecessary table partition according to the partition
> > key
> > > > > > range ,
> > > > > > > I think more options are available now:
> > > > > > >
> > > > > > > (1) Parquet / ORC format introduce a lightweight meta data info
> > > like
> > > > > > > Min/Max/Bloom filter for each block, such meta data can be
> > > exploited
> > > > > when
> > > > > > > predicate/filter info can be fetched before executing scan.
> > > > > > >
> > > > > > > However now in HAWQ, all data in parquet need to be scanned
> into
> > > > memory
> > > > > > > before processing predicate/filter. We don't generate the meta
> > info
> > > > > when
> > > > > > > INSERT into parquet table, the scan executor doesn't utilize
> the
> > > meta
> > > > > > info
> > > > > > > neither. Maybe some scan API need to be refactored so that we
> can
> > > get
> > > > > > > predicate/filter
> > > > > > > info before executing base relation scan.
> > > > > > >
> > > > > > > (2) Base on (1) technology,  especially with Bloom filter, more
> > > > > optimizer
> > > > > > > technology can be explored furthur. E.g. Impala implemented
> > Runtime
> > > > > > > filtering(*
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.cloudera.com/documentation/enterprise/latest/topics/impala_runtime_filtering.html
> > > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.cloudera.com/documentation/enterprise/latest/topics/impala_runtime_filtering.html
> > > > > > >*
> > > > > > > ),  which can be used at
> > > > > > > - dynamic partition pruning
> > > > > > > - converting join predicate to base relation predicate
> > > > > > >
> > > > > > > It tell the executor to wait for one moment(the interval time
> can
> > > be
> > > > > set
> > > > > > in
> > > > > > > guc) before executing base relation scan, if the interested
> > > > values(e.g.
> > > > > > the
> > > > > > > column in join predicate only have very small set) arrived in
> > time,
> > > > it
> > > > > > can
> > > > > > > use these value to filter this scan, if doesn't arrived in
> time,
> > it
> > > > > scan
> > > > > > > without this filter, which doesn't impact result correctness.
> > > > > > >
> > > > > > > Unlike (1) technology, this technology cannot be used in any
> > case,
> > > it
> > > > > > only
> > > > > > > outperform in some cases. So it just add some more query plan
> > > > > > > choices/paths, and the optimizer need based on statistics info
> to
> > > > > > calculate
> > > > > > > the cost, and apply it when cost down.
> > > > > > >
> > > > > > > All in one, maybe more similar technology can be adoptable 

Re: A question of HAWQ

2016-07-13 Thread Lei Chang
please see items on the roadmap page here:
https://cwiki.apache.org/confluence/display/HAWQ/HAWQ+Roadmap

Cheers
Lei


On Thu, Jul 14, 2016 at 8:44 AM, Paul Guo  wrote:

> HAWQ has not supported update and delete yet, but the feature is on the
> plan.
>
> 2016-07-13 23:13 GMT+08:00 Wales Wang :
>
> > HAWQ is append only.
> > Actian vectorH have full feature and index,update, delete
> >
> > Wales Wang
> >
> > 在 2016-7-13,上午10:10,jinzhy  写道:
> >
> > > hello everybody,
> > >
> > >   Can HAWQ or Pivotal HD support 'delete' or 'update' operation in
> > HDFS now?I can only  create append only table in my computer。
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> ---
> > > Confidentiality Notice: The information contained in this e-mail and
> any
> > accompanying attachment(s)
> > > is intended only for the use of the intended recipient and may be
> > confidential and/or privileged of
> > > Neusoft Corporation, its subsidiaries and/or its affiliates. If any
> > reader of this communication is
> > > not the intended recipient, unauthorized use, forwarding, printing,
> > storing, disclosure or copying
> > > is strictly prohibited, and may be unlawful.If you have received this
> > communication in error,please
> > > immediately notify the sender by return e-mail, and delete the original
> > message and all copies from
> > > your system. Thank you.
> > >
> >
> ---
> >
>


Re: [Propose] Create a new HAWQ roadmap page

2016-07-13 Thread Lei Chang
Added to the page:
https://cwiki.apache.org/confluence/display/HAWQ/HAWQ+Roadmap

It can be incrementally updated.

Cheers
Lei




On Wed, Jun 29, 2016 at 5:23 PM, Lei Chang  wrote:

>
> I classified the items into the following categories. appreciate your
> comments.
>
> Cloud related:
> [HAWQ-308] - S3 Integration
> [HAWQ-310] - Snapshot support
>
> Data Management Functionality Enhancement
> [HAWQ-786] - Framework to support pluggable formats and file systems
> [HAWQ-864] - Support ORC as a native file format
> [HAWQ-150] - External tables can be designated for both READ and WRITE
> [HAWQ-304] - Support update and delete on non-heap tables
> [HAWQ-401] - json type support
> [HAWQ-319] - REST API for HAWQ
> [HAWQ-312] - Multiple active master support
>
> Performance enhancement
> [HAWQ-303] - Index support for non-heap tables
>
> Languages & Analytics
> [HAWQ-321] - Support plpython3u
>
> Ecosystem:
> [HAWQ-256] - Integrate Security with Apache Ranger
> [HAWQ-29] - Refactor HAWQ InputFormat to support Spark/Scala
>
> Management & Build
> [HAWQ-8] - Installing the HAWQ Software thru the Apache Ambari
> [HAWQ-311] - Data Transfer tool
> [HAWQ-326] - Support RPM build for HAWQ
>
> Cheers
> Lei
>
>
>
>
> On Fri, Jun 24, 2016 at 5:10 PM, Lei Chang  wrote:
>
>>
>> Nice, I created a page and we can discuss the items and put them on the
>> page.
>>
>> For the items, I think it makes sense to add at least items in the jira
>> roadmap panel, here are some major ones I extracted from the panel. looks
>> better to classify them into categories.
>>
>> [HAWQ-786] - Framework to support pluggable formats and file systems
>> [HAWQ-864] - Support ORC as a native file format
>> [HAWQ-308] - S3 Integration
>> [HAWQ-256] - Integrate Security with Apache Ranger
>> [HAWQ-150] - External tables can be designated for both READ and WRITE
>> [HAWQ-303] - Index support for non-heap tables
>> [HAWQ-304] - Support update and delete on non-heap tables
>> [HAWQ-310] - Snapshot support
>> [HAWQ-312] - Multiple active master support
>> [HAWQ-319] - REST API for HAWQ
>> [HAWQ-321] - Support plpython3u
>> [HAWQ-29] - Refactor HAWQ InputFormat to support Spark/Scala
>> [HAWQ-311] - Data Transfer tool
>> [HAWQ-326] - Support RPM build for HAWQ
>> [HAWQ-8] - Installing the HAWQ Software thru the Apache Ambari
>> [HAWQ-752] - build pxf compatible with Apache Hadoop
>> [HAWQ-401] - json type support
>>
>> Cheers
>> Lei
>>
>>
>>
>> On Thu, Jun 23, 2016 at 11:23 PM, Vineet Goel  wrote:
>>
>>> +1 too
>>>
>>> I can help start a draft on the wiki based on historical user requests
>>> and
>>> trends in the ecosystem. And of course, the roadmap is a living and
>>> breathing document which will continue to evolve over time based on
>>> continuous feedback, and more.
>>>
>>> -Vineet
>>>
>>>
>>> On Thu, Jun 23, 2016 at 8:18 AM, Kavinder Dhaliwal >> >
>>> wrote:
>>>
>>> > +1 I'm in favor of this. The Zeppelin roadmap is very community driven
>>> and
>>> > having something similar for HAWQ will go a long way to getting more
>>> > feedback about the overall direction and goals of HAWQ.
>>> >
>>> > On Thu, Jun 23, 2016 at 2:02 AM, Lei Chang 
>>> wrote:
>>> >
>>> > > Hi Guys,
>>> > >
>>> > > I noticed there are a lot of requests about hawq roadmaps coming
>>> from the
>>> > > offline hawq activities (meetup et al).
>>> > >
>>> > > Although we have the list of backlog JIRAs on our JIRA page
>>> > > <
>>> > >
>>> >
>>> https://issues.apache.org/jira/browse/HAWQ/?selectedTab=com.atlassian.jira.jira-projects-plugin:roadmap-panel
>>> > > >.
>>> > > But it does not give a high level description. A good example from
>>> other
>>> > > communities is here:
>>> > >
>>> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
>>> > >
>>> > > So I am proposing we have a similar HAWQ Roadmap page maintained on
>>> our
>>> > > wiki page.
>>> > >
>>> > > Thoughts?
>>> > >
>>> > > Cheers
>>> > > Lei
>>> > >
>>> >
>>>
>>
>>
>


Re: A question of HAWQ

2016-07-13 Thread Paul Guo
HAWQ has not supported update and delete yet, but the feature is on the
plan.

2016-07-13 23:13 GMT+08:00 Wales Wang :

> HAWQ is append only.
> Actian vectorH have full feature and index,update, delete
>
> Wales Wang
>
> 在 2016-7-13,上午10:10,jinzhy  写道:
>
> > hello everybody,
> >
> >   Can HAWQ or Pivotal HD support 'delete' or 'update' operation in
> HDFS now?I can only  create append only table in my computer。
> >
> >
> >
> >
> >
> >
> >
> >
> >
> ---
> > Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s)
> > is intended only for the use of the intended recipient and may be
> confidential and/or privileged of
> > Neusoft Corporation, its subsidiaries and/or its affiliates. If any
> reader of this communication is
> > not the intended recipient, unauthorized use, forwarding, printing,
> storing, disclosure or copying
> > is strictly prohibited, and may be unlawful.If you have received this
> communication in error,please
> > immediately notify the sender by return e-mail, and delete the original
> message and all copies from
> > your system. Thank you.
> >
> ---
>


Re: Rename "greenplum" to "hawq"

2016-07-13 Thread Ting(Goden) Yao
Thanks, I've update the JIRA with more info and corrected versions. I think
we have the consensus that this needs to be done.
Please add more details in the JIRA or add tasks as appropriate.

On Wed, Jul 13, 2016 at 10:33 AM Lisa Owen  wrote:

> there is an existing JIRA related to renaming greenplum references:
>
> https://issues.apache.org/jira/browse/HAWQ-902
>
>
> On Wed, Jul 13, 2016 at 10:20 AM, Ting(Goden) Yao  wrote:
>
> > can anyone who knows details start filing jira and mark the version to
> > "2.0.1.0-incubating" or backlog?
> >
> > On Tue, Jul 12, 2016 at 11:56 PM Hubert Zhang  wrote:
> >
> > > +1 This is a legacy problem. Clear GUCs including "GP" may be the first
> > > step.
> > >
> > > On Wed, Jul 13, 2016 at 2:07 PM, Wen Lin  wrote:
> > >
> > > > It's definitely a good thing, but needs very careful thought on
> effects
> > > to
> > > > customers.
> > > > We can try to list these effects as much as possile.
> > > >
> > > > On Wed, Jul 13, 2016 at 1:55 PM, Yi Jin  wrote:
> > > >
> > > > > I think it is a must-do, but some concerns of customer using
> > convention
> > > > and
> > > > > legacy applications, scripts etc.
> > > > >
> > > > > On Wed, Jul 13, 2016 at 1:44 PM, 陶征霖  wrote:
> > > > >
> > > > > > Good idea, but need quite a lot of effort and may also affect
> > > custormer
> > > > > > behavior. Should handle it carefully.
> > > > > >
> > > > > > 2016-07-13 9:54 GMT+08:00 Ivan Weng :
> > > > > >
> > > > > > > Agree with this good idea. But as Paul said, there are maybe
> > > already
> > > > > many
> > > > > > > users use greeenplum_path.sh or something else in their
> > > environment.
> > > > So
> > > > > > we
> > > > > > > need to think about it.
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > > Ivan
> > > > > > >
> > > > > > > On Wed, Jul 13, 2016 at 9:31 AM, Paul Guo 
> > > wrote:
> > > > > > >
> > > > > > > > I've asked this before. Seems that affects some old users.
> I'm
> > > not
> > > > > sure
> > > > > > > > about the details.
> > > > > > > > I agree that we should change it to a better name in a
> release.
> > > > > > > >
> > > > > > > > 2016-07-13 9:25 GMT+08:00 Roman Shaposhnik <
> > ro...@shaposhnik.org
> > > >:
> > > > > > > >
> > > > > > > > > On Tue, Jul 12, 2016 at 6:21 PM, Xiang Sheng <
> > > xsh...@pivotal.io>
> > > > > > > wrote:
> > > > > > > > > > Agree . @xunzhang.
> > > > > > > > > > However , some greenplum strings can be easily replaced ,
> > but
> > > > > there
> > > > > > > are
> > > > > > > > > too
> > > > > > > > > > many in the code or comments.  Changing all of them costs
> > too
> > > > > much
> > > > > > > > > efforts.
> > > > > > > > > >
> > > > > > > > > > So changing the strings that users can see is enough.
> > > > > > > > >
> > > > > > > > > Huge +1 to this! Btw, is this something we may be able to
> > > tackle
> > > > in
> > > > > > our
> > > > > > > > > next Apache release?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Roman.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks,
> > > > > > Zhenglin
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks
> > >
> > > Hubert Zhang
> > >
> >
>


Re: Rename "greenplum" to "hawq"

2016-07-13 Thread Lisa Owen
there is an existing JIRA related to renaming greenplum references:

https://issues.apache.org/jira/browse/HAWQ-902


On Wed, Jul 13, 2016 at 10:20 AM, Ting(Goden) Yao  wrote:

> can anyone who knows details start filing jira and mark the version to
> "2.0.1.0-incubating" or backlog?
>
> On Tue, Jul 12, 2016 at 11:56 PM Hubert Zhang  wrote:
>
> > +1 This is a legacy problem. Clear GUCs including "GP" may be the first
> > step.
> >
> > On Wed, Jul 13, 2016 at 2:07 PM, Wen Lin  wrote:
> >
> > > It's definitely a good thing, but needs very careful thought on effects
> > to
> > > customers.
> > > We can try to list these effects as much as possile.
> > >
> > > On Wed, Jul 13, 2016 at 1:55 PM, Yi Jin  wrote:
> > >
> > > > I think it is a must-do, but some concerns of customer using
> convention
> > > and
> > > > legacy applications, scripts etc.
> > > >
> > > > On Wed, Jul 13, 2016 at 1:44 PM, 陶征霖  wrote:
> > > >
> > > > > Good idea, but need quite a lot of effort and may also affect
> > custormer
> > > > > behavior. Should handle it carefully.
> > > > >
> > > > > 2016-07-13 9:54 GMT+08:00 Ivan Weng :
> > > > >
> > > > > > Agree with this good idea. But as Paul said, there are maybe
> > already
> > > > many
> > > > > > users use greeenplum_path.sh or something else in their
> > environment.
> > > So
> > > > > we
> > > > > > need to think about it.
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Ivan
> > > > > >
> > > > > > On Wed, Jul 13, 2016 at 9:31 AM, Paul Guo 
> > wrote:
> > > > > >
> > > > > > > I've asked this before. Seems that affects some old users. I'm
> > not
> > > > sure
> > > > > > > about the details.
> > > > > > > I agree that we should change it to a better name in a release.
> > > > > > >
> > > > > > > 2016-07-13 9:25 GMT+08:00 Roman Shaposhnik <
> ro...@shaposhnik.org
> > >:
> > > > > > >
> > > > > > > > On Tue, Jul 12, 2016 at 6:21 PM, Xiang Sheng <
> > xsh...@pivotal.io>
> > > > > > wrote:
> > > > > > > > > Agree . @xunzhang.
> > > > > > > > > However , some greenplum strings can be easily replaced ,
> but
> > > > there
> > > > > > are
> > > > > > > > too
> > > > > > > > > many in the code or comments.  Changing all of them costs
> too
> > > > much
> > > > > > > > efforts.
> > > > > > > > >
> > > > > > > > > So changing the strings that users can see is enough.
> > > > > > > >
> > > > > > > > Huge +1 to this! Btw, is this something we may be able to
> > tackle
> > > in
> > > > > our
> > > > > > > > next Apache release?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Roman.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Zhenglin
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks
> >
> > Hubert Zhang
> >
>


Re: [VOTE] HAWQ 2.0.0-incubating Release

2016-07-13 Thread Ting(Goden) Yao
Per the discussion: we now still have 2 issues:

   - https://issues.apache.org/jira/browse/HAWQ-915 (RAT check issues),
   this is being worked on
   - make pycrypto dependency optional : there's contradictory JIRAs
   https://issues.apache.org/jira/browse/HAWQ-863 (add pycrypto module
   back) - fixed
   https://issues.apache.org/jira/browse/HAWQ-271 (remove python modules) -
   fixed
   Please whoever knows the details, file a new JIRA , so we start with a
   clean slate for this release.


Thanks
-Goden

On Tue, Jul 12, 2016 at 4:22 PM Ting(Goden) Yao  wrote:

> HAWQ-919 has been resolved and ported to 2.0.0.0-incubating branch.
>
> On Tue, Jul 12, 2016 at 3:45 AM Paul Guo  wrote:
>
>> I've sent a pull request to resolve the rat issue,
>>
>>https://github.com/apache/incubator-hawq/pull/788
>>
>> There are still several pxf related files which have license issues.
>>
>> We filed another JIRA to let the pxf guys fix this.
>>
>> 
>>
>>   https://issues.apache.org/jira/browse/HAWQ-919
>>
>>
>> 2016-07-12 11:00 GMT+08:00 Paul Guo :
>>
>> > 0. Yes, RAT check failures need to be handled. I do not know much about
>> RAT
>> > but if we could just check affect files of one patch I'd 100% agree
>> it
>> > is in CI.
>> >
>> > 1. Those "git-cloned" repo are "git-cloned" only when necessary (i.e.
>> when
>> > related options are specified in configure command.
>> >
>> > 2. The thrift info has been in the ImportLogs file. I'd suggest move
>> this
>> > (probably
>> >  with more details) into the README file.
>> >
>> > 2016-07-12 0:33 GMT+08:00 Roman Shaposhnik :
>> >
>> >> On Mon, Jul 11, 2016 at 2:27 AM, Radar Da lei  wrote:
>> >> > Hi Goden,
>> >> >
>> >> > I have pushed commits of 'HAWQ-892
>> >> > ' and 'HAWQ-901
>> >> > ' into branch
>> >> > '2.0.0.0-incubating'.
>> >>
>> >> Ok, with these two additional commits I presumed the branch was ready
>> >> for review. I'm not done with the full review yet, but here are the top
>> >> concerns
>> >> that would make me -1 this branch if it did go for a vote:
>> >>0. mvn verify produces tons of RAT check failures that need to be
>> >> carefully
>> >>analyzed. As an aside -- I highly recommend having a CI job that
>> >> runs mvn verify
>> >>on a regular basis.
>> >>
>> >>1. Pulling source from external repositories in an unconditional
>> way.
>> >> There's quite a bit of 'git clone' going on in the build system.
>> >> The easiest way
>> >> to see it all is to run
>> >>$ git grep -R 'git ' . | grep clone
>> >> My first concern is that all of these calls need to be made
>> >> conditional. IOW,
>> >> I should be able to build a basic HAWQ binary without it doing
>> >> 'git clone' and
>> >> instead relying on pointers to the same binary dependencies
>> provided
>> >> via
>> >> build configuration. This could be a documentation issue and if so
>> >> I'd appreciate
>> >> having it published on the wiki some place.
>> >>
>> >> On top of that, we have two bigger issues with the following repos:
>> >> https://github.com/jconway/plr.git  -- GPL
>> >> https://github.com/postgres/postgres.git -- Cryptography
>> >>
>> >> We need to make sure that HAWQ can be built with those altogether.'
>> >>
>> >> 2. As a minor nit, I see that you imported thrift source under
>> >> depends/thirdparty/thrift
>> >> and it would be great if there were a way to:
>> >> 2.1. make sure that it is obvious what *release* version of
>> >> thrift it was
>> >> 2.2. make sure that it is obvious if anything in there gets
>> >> patched
>> >>
>> >>
>> >> Thanks,
>> >> Roman.
>> >>
>> >
>> >
>>
>


Re: Rename "greenplum" to "hawq"

2016-07-13 Thread Ting(Goden) Yao
can anyone who knows details start filing jira and mark the version to
"2.0.1.0-incubating" or backlog?

On Tue, Jul 12, 2016 at 11:56 PM Hubert Zhang  wrote:

> +1 This is a legacy problem. Clear GUCs including "GP" may be the first
> step.
>
> On Wed, Jul 13, 2016 at 2:07 PM, Wen Lin  wrote:
>
> > It's definitely a good thing, but needs very careful thought on effects
> to
> > customers.
> > We can try to list these effects as much as possile.
> >
> > On Wed, Jul 13, 2016 at 1:55 PM, Yi Jin  wrote:
> >
> > > I think it is a must-do, but some concerns of customer using convention
> > and
> > > legacy applications, scripts etc.
> > >
> > > On Wed, Jul 13, 2016 at 1:44 PM, 陶征霖  wrote:
> > >
> > > > Good idea, but need quite a lot of effort and may also affect
> custormer
> > > > behavior. Should handle it carefully.
> > > >
> > > > 2016-07-13 9:54 GMT+08:00 Ivan Weng :
> > > >
> > > > > Agree with this good idea. But as Paul said, there are maybe
> already
> > > many
> > > > > users use greeenplum_path.sh or something else in their
> environment.
> > So
> > > > we
> > > > > need to think about it.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Ivan
> > > > >
> > > > > On Wed, Jul 13, 2016 at 9:31 AM, Paul Guo 
> wrote:
> > > > >
> > > > > > I've asked this before. Seems that affects some old users. I'm
> not
> > > sure
> > > > > > about the details.
> > > > > > I agree that we should change it to a better name in a release.
> > > > > >
> > > > > > 2016-07-13 9:25 GMT+08:00 Roman Shaposhnik  >:
> > > > > >
> > > > > > > On Tue, Jul 12, 2016 at 6:21 PM, Xiang Sheng <
> xsh...@pivotal.io>
> > > > > wrote:
> > > > > > > > Agree . @xunzhang.
> > > > > > > > However , some greenplum strings can be easily replaced , but
> > > there
> > > > > are
> > > > > > > too
> > > > > > > > many in the code or comments.  Changing all of them costs too
> > > much
> > > > > > > efforts.
> > > > > > > >
> > > > > > > > So changing the strings that users can see is enough.
> > > > > > >
> > > > > > > Huge +1 to this! Btw, is this something we may be able to
> tackle
> > in
> > > > our
> > > > > > > next Apache release?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Roman.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Zhenglin
> > > >
> > >
> >
>
>
>
> --
> Thanks
>
> Hubert Zhang
>


Re: HAWQ JIRA: Consolidate 2.0.0 to 2.0.0.0-incubating version

2016-07-13 Thread Goden Yao
Haven't seen any response, I've made the filters public (sorry it was
private so not visible to others).
Vineet has left comments in the 5 open JIRAs and I've changed the version
to "backlog" at the moment.
Please assignees take a look and let us know which version these JIRAs are
supposed to be in.

I'll start the following work tomorrow if no objections.

   - Bulk edit all *resolved / closed * 2.0.0 version JIRA to
   2.0.0.0-incubating: query is here:
   https://issues.apache.org/jira/issues/?filter=12337921 , *306* issues in
   total

-Goden

On Mon, Jul 11, 2016 at 9:36 PM Goden Yao  wrote:

> As I mentioned before, we had a few versions in the current HAWQ JIRA
> version page. Particularly, there's confusion between 2.0.0.0-incubating
> (upcoming release that the community is driving to ship right now) vs.
> 2.0.0 (used originally for features post 2.0.0.0-incubating).
>
> When we created these 2 versions, we thought 2.0.0.0-incubating was
> supposed to be release by end of 2015.
> Now it got delayed so long, there's no sense to separate these 2 versions.
>
> So I think we should consolidate these 2 versions in JIRA, steps would be:
>
>- Bulk edit all *resolved / closed * 2.0.0 version JIRA to
>2.0.0.0-incubating: query is here:
>https://issues.apache.org/jira/issues/?filter=12337921 , *306* issues
>in total
>- There're still *5* open items with 2.0.0 version, assignees of these
>5 JIRAs should re-evaluate them and see which version we want to put them
>in. you can find these 5 JIRAs here:
>https://issues.apache.org/jira/issues/?filter=12337922
>
> I hope this can clean up version issues we see in JIRA.
> I've also updated the versions to add one "*2.0.1.0*" version for the
> next maintenance release.
>
> Please let me know if you vote against consolidating the 2 versions
> otherwise I'll start the work soon this week.
>
> -Goden
>


Re: Make Fix Version/s Column mandatory for HAWQ JIRAs

2016-07-13 Thread Ting(Goden) Yao
I've put a request to INFRA:
https://issues.apache.org/jira/browse/INFRA-12259

If you think the version you need is not on the current list, please send
email to dev@ and one of the admins can make a call if we should create the
version.

-Goden

On Tue, Jul 12, 2016 at 10:40 AM Oleksandr Diachenko 
wrote:

> +1 for making it mandatory.
>
> On Mon, Jul 11, 2016 at 9:26 PM, Goden Yao  wrote:
>
> > Hi,
> >
> > As release manager for HAWQ right now, I feel it's hard to track each
> JIRA
> > for specific releases when we file JIRAs without "Fix Version/s" column
> > filled.
> >
> > So I want to propose to make this column mandatory.
> >
> > If you don't know if the JIRA you filed will make a specific release, we
> > can default it to "Backlog".
> >
> > Any objections / concerns, please chime in and let me know.
> >
> > Thanks
> > -Goden
> >
>


Re: A question of HAWQ (update/delete)

2016-07-13 Thread James Campbell
One way an “delete” can be implemented in Hawq is to use partition-exchange.  
It is ugly, but it provides away of implement more of a batch update/delete.

In order to do this, you need partitioned tables.   For a delete, you select 
everything in the target partition out less the row(s) that you don’t want into 
a new partitioned table.   The new table has the correct data in it. You use 
the alter table command to execute a partition exchange.

You can us this general approach to perform an update.

Generally, it is not a good design practice to use deletes with big data or in 
data warehouses.  We don’t want to be trying to implement an OLTP system.  
However, if there is a business case to do this, it can be done in more of a 
batch fashion.

— 
Jim Campbell
jacampb...@pivotal.io 
O: 703-753-5970
M: 571-247-6511

On July 13, 2016 at 12:18:19 PM, Wales Wang (wormw...@yahoo.com.invalid) wrote:

Actian vectorH can support update , delete on HDFS  

Wales Wang  

在 2016-7-13,下午11:15,Vineet Goel  写道:  

> Update and Delete is not supported yet, but it's in the future plan. There  
> is a JIRA on it I believe:  
>  
> https://issues.apache.org/jira/browse/HAWQ-304  
>  
> Out of curiosity, what specific scenario are you trying to solve that  
> requires update/delete ?  
>  
>  
> -Vineet  
>  
>  
>  
> On Tue, Jul 12, 2016 at 7:10 PM, jinzhy  wrote:  
>  
>> hello everybody,  
>>  
>> Can HAWQ or Pivotal HD support 'delete' or 'update' operation in  
>> HDFS now?I can only create append only table in my computer。  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>>  
>> ---
>>   
>> Confidentiality Notice: The information contained in this e-mail and any  
>> accompanying attachment(s)  
>> is intended only for the use of the intended recipient and may be  
>> confidential and/or privileged of  
>> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader  
>> of this communication is  
>> not the intended recipient, unauthorized use, forwarding, printing,  
>> storing, disclosure or copying  
>> is strictly prohibited, and may be unlawful.If you have received this  
>> communication in error,please  
>> immediately notify the sender by return e-mail, and delete the original  
>> message and all copies from  
>> your system. Thank you.  
>>  
>> ---
>>   
>>  


Re: A question of HAWQ (update/delete)

2016-07-13 Thread Wales Wang
Actian vectorH can support update , delete on HDFS

Wales Wang

在 2016-7-13,下午11:15,Vineet Goel  写道:

> Update and Delete is not supported yet, but it's in the future plan. There
> is a JIRA on it I believe:
> 
> https://issues.apache.org/jira/browse/HAWQ-304
> 
> Out of curiosity, what specific scenario are you trying to solve that
> requires update/delete ?
> 
> 
> -Vineet
> 
> 
> 
> On Tue, Jul 12, 2016 at 7:10 PM, jinzhy  wrote:
> 
>> hello everybody,
>> 
>>  Can HAWQ or Pivotal HD support 'delete' or 'update' operation in
>> HDFS now?I can only  create append only table in my computer。
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ---
>> Confidentiality Notice: The information contained in this e-mail and any
>> accompanying attachment(s)
>> is intended only for the use of the intended recipient and may be
>> confidential and/or privileged of
>> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader
>> of this communication is
>> not the intended recipient, unauthorized use, forwarding, printing,
>> storing, disclosure or copying
>> is strictly prohibited, and may be unlawful.If you have received this
>> communication in error,please
>> immediately notify the sender by return e-mail, and delete the original
>> message and all copies from
>> your system. Thank you.
>> 
>> ---
>> 


Re: A question of HAWQ (update/delete)

2016-07-13 Thread Vineet Goel
Update and Delete is not supported yet, but it's in the future plan. There
is a JIRA on it I believe:

https://issues.apache.org/jira/browse/HAWQ-304

Out of curiosity, what specific scenario are you trying to solve that
requires update/delete ?


-Vineet



On Tue, Jul 12, 2016 at 7:10 PM, jinzhy  wrote:

> hello everybody,
>
>   Can HAWQ or Pivotal HD support 'delete' or 'update' operation in
> HDFS now?I can only  create append only table in my computer。
>
>
>
>
>
>
>
>
>
> ---
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s)
> is intended only for the use of the intended recipient and may be
> confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader
> of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,
> storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this
> communication in error,please
> immediately notify the sender by return e-mail, and delete the original
> message and all copies from
> your system. Thank you.
>
> ---
>


Re: A question of HAWQ

2016-07-13 Thread Wales Wang
HAWQ is append only.
Actian vectorH have full feature and index,update, delete

Wales Wang

在 2016-7-13,上午10:10,jinzhy  写道:

> hello everybody,
> 
>   Can HAWQ or Pivotal HD support 'delete' or 'update' operation in HDFS 
> now?I can only  create append only table in my computer。
> 
> 
> 
> 
>
> 
> 
> 
> ---
> Confidentiality Notice: The information contained in this e-mail and any 
> accompanying attachment(s) 
> is intended only for the use of the intended recipient and may be 
> confidential and/or privileged of 
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
> this communication is 
> not the intended recipient, unauthorized use, forwarding, printing,  storing, 
> disclosure or copying 
> is strictly prohibited, and may be unlawful.If you have received this 
> communication in error,please 
> immediately notify the sender by return e-mail, and delete the original 
> message and all copies from 
> your system. Thank you. 
> ---


A question of HAWQ

2016-07-13 Thread jinzhy
hello everybody,

  Can HAWQ or Pivotal HD support 'delete' or 'update' operation in HDFS 
now?I can only  create append only table in my computer。




   




---
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please
immediately notify the sender by return e-mail, and delete the original message 
and all copies from
your system. Thank you.
---


Re: Question on hawq_rm_nvseg_perquery_limit

2016-07-13 Thread Lei Chang
On Wed, Jul 13, 2016 at 3:16 PM, Vineet Goel  wrote:

> This leads me to another question on Apache Ambari UI integration.
>
> It seems the need to tune hawq_rm_nvseg_perquery_limit is minimal, as we
> seem to prescribe a limit of 512 regardless of cluster size. If that's the
> case, two options come to mind:
>
> 1) Either the "default" hawq_rm_nvseg_perquery_limit should be the lower
> value between (6 * segment host count) and 512. This way, it's less
> confusing to users and there is a logic behind the value.
>

If ambari uses the lower value, it is difficult to change
hawq_rm_nvseg_perquery_perseg_limit anymore.

for example, it we want to change hawq_rm_nvseg_perquery_perseg_limit to 8
for better performance on lower concurrency workload, it is doable anymore.


>
> 2) Or, the parameter should not be exposed on the UI, leaving the default
> to 512. When/why would a user want to change this value?
>

I think this is an advanced configuration and only used by some cases, not
exposed is fine, but i think we need a way to change it.

If users want to increase the max value of degree of parallelism, users
should change this. For example, if end user workload has just some simple
to scale queries, on a large cluster, it is fine to tune the value.


>
> Thoughts?
>
> Vineet
>
>
> On Tue, Jul 12, 2016 at 11:51 PM, Hubert Zhang  wrote:
>
> > +1 with Yi's answer.
> > Vseg numbers are controlled by Resource Negotiator(a module before
> > planner),  all the vseg related GUCs will affect the behaviour of RN,
> some
> > of them will also affect Resource Manager.
> > To be specific, hawq_rm_nvseg_perquery_limit and
> > hawq_rm_nvseg_perquery_perseg_limit
> > are both considered by Resource Negotiator(RN) and Resource Manager(RM),
> > while default_hash_table_bucket_number is only considered by RN.
> > As a result, suppose default_hash_table_bucket_number  = 60, query like
> > "select * from hash_table" will request #60 vsegs in RN and if
> > hawq_rm_nvseg_perquery_limit
> > is less than 60, RM will not able to allocate 60 vsegs.
> >
> > So we need to ensure default_hash_table_bucket_number is less than the
> > capacity of RM.
> >
> > On Wed, Jul 13, 2016 at 1:40 PM, Yi Jin  wrote:
> >
> > > Hi Vineet,
> > >
> > > Some my comment.
> > >
> > > For question 1.
> > > Yes,
> > > perquery_limit is introduced mainly for restrict resource usage in
> large
> > > scale cluster; perquery_perseg_limit is to avoid allocating too many
> > > processes in one segment, which may cause serious performance issue.
> So,
> > > two gucs are for different performance aspects. Along with the
> variation
> > of
> > > cluster scale, one of the two limits actually takes effect. We dont
> have
> > to
> > > let both active for resource allocation.
> > >
> > > For question 2.
> > >
> > > In fact, perquery_perseg_limit is a general resource restriction for
> all
> > > queries not only hash table queries and external table queries, this is
> > why
> > > this guc is not merged with another one. For example, when we run some
> > > queries upon random distributed tables, it does not make sense to let
> > > resource manager refer a guc for hash table.
> > >
> > > For the last topic item.
> > >
> > > In my opinion, it is not necessary to adjust
> > hawq_rm_nvseg_perquery_limit,
> > > say, we just need to leave it unchanged and actually not active until
> we
> > > really want to run a large-scale HAWQ cluster, for example, 100+ nodes.
> > >
> > > Best,
> > > Yi
> > >
> > > On Wed, Jul 13, 2016 at 1:18 PM, Vineet Goel 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I’m trying to document some GUC usage in detail and have questions on
> > > > hawq_rm_nvseg_perquery_limit and hawq_rm_nvseg_perquery_perseg_limit
> > > > tuning.
> > > >
> > > > *hawq_rm_nvseg_perquery_limit* = (default value = 512) . Let’s call
> it
> > > > *perquery_limit* in short.
> > > > *hawq_rm_nvseg_perquery_perseg_limit* (default value = 6) . Let’s
> call
> > it
> > > > *perquery_perseg_limit* in short.
> > > >
> > > >
> > > > 1) Is there ever any benefit in having perquery_limit *greater than*
> > > > (perquery_perseg_limit * segment host count) ?
> > > > For example in a 10-node cluster, HAWQ will never allocate more than
> > (GUC
> > > > default 6 * 10 =) 60 v-segs, so the perquery_limit default of 512
> > doesn’t
> > > > have any effect. It seems perquery_limit overrides (takes effect)
> > > > perquery_perseg_limit only when it’s value is less than
> > > > (perquery_perseg_limit * segment host count).
> > > >
> > > > Is that the correct assumption? That would make sense, as users may
> > want
> > > to
> > > > keep a check on how much processing a single query can take up (that
> > > > implies that the limit must be lower than the total possible v-segs).
> > Or,
> > > > it may make sense in large clusters (100-nodes or more) where we need
> > to
> > > > limit the pressure on HDFS.
> > > >
> > > >
> > > > 

Re: Question on hawq_rm_nvseg_perquery_limit

2016-07-13 Thread Vineet Goel
This leads me to another question on Apache Ambari UI integration.

It seems the need to tune hawq_rm_nvseg_perquery_limit is minimal, as we
seem to prescribe a limit of 512 regardless of cluster size. If that's the
case, two options come to mind:

1) Either the "default" hawq_rm_nvseg_perquery_limit should be the lower
value between (6 * segment host count) and 512. This way, it's less
confusing to users and there is a logic behind the value.

2) Or, the parameter should not be exposed on the UI, leaving the default
to 512. When/why would a user want to change this value?

Thoughts?

Vineet


On Tue, Jul 12, 2016 at 11:51 PM, Hubert Zhang  wrote:

> +1 with Yi's answer.
> Vseg numbers are controlled by Resource Negotiator(a module before
> planner),  all the vseg related GUCs will affect the behaviour of RN, some
> of them will also affect Resource Manager.
> To be specific, hawq_rm_nvseg_perquery_limit and
> hawq_rm_nvseg_perquery_perseg_limit
> are both considered by Resource Negotiator(RN) and Resource Manager(RM),
> while default_hash_table_bucket_number is only considered by RN.
> As a result, suppose default_hash_table_bucket_number  = 60, query like
> "select * from hash_table" will request #60 vsegs in RN and if
> hawq_rm_nvseg_perquery_limit
> is less than 60, RM will not able to allocate 60 vsegs.
>
> So we need to ensure default_hash_table_bucket_number is less than the
> capacity of RM.
>
> On Wed, Jul 13, 2016 at 1:40 PM, Yi Jin  wrote:
>
> > Hi Vineet,
> >
> > Some my comment.
> >
> > For question 1.
> > Yes,
> > perquery_limit is introduced mainly for restrict resource usage in large
> > scale cluster; perquery_perseg_limit is to avoid allocating too many
> > processes in one segment, which may cause serious performance issue. So,
> > two gucs are for different performance aspects. Along with the variation
> of
> > cluster scale, one of the two limits actually takes effect. We dont have
> to
> > let both active for resource allocation.
> >
> > For question 2.
> >
> > In fact, perquery_perseg_limit is a general resource restriction for all
> > queries not only hash table queries and external table queries, this is
> why
> > this guc is not merged with another one. For example, when we run some
> > queries upon random distributed tables, it does not make sense to let
> > resource manager refer a guc for hash table.
> >
> > For the last topic item.
> >
> > In my opinion, it is not necessary to adjust
> hawq_rm_nvseg_perquery_limit,
> > say, we just need to leave it unchanged and actually not active until we
> > really want to run a large-scale HAWQ cluster, for example, 100+ nodes.
> >
> > Best,
> > Yi
> >
> > On Wed, Jul 13, 2016 at 1:18 PM, Vineet Goel  wrote:
> >
> > > Hi all,
> > >
> > > I’m trying to document some GUC usage in detail and have questions on
> > > hawq_rm_nvseg_perquery_limit and hawq_rm_nvseg_perquery_perseg_limit
> > > tuning.
> > >
> > > *hawq_rm_nvseg_perquery_limit* = (default value = 512) . Let’s call it
> > > *perquery_limit* in short.
> > > *hawq_rm_nvseg_perquery_perseg_limit* (default value = 6) . Let’s call
> it
> > > *perquery_perseg_limit* in short.
> > >
> > >
> > > 1) Is there ever any benefit in having perquery_limit *greater than*
> > > (perquery_perseg_limit * segment host count) ?
> > > For example in a 10-node cluster, HAWQ will never allocate more than
> (GUC
> > > default 6 * 10 =) 60 v-segs, so the perquery_limit default of 512
> doesn’t
> > > have any effect. It seems perquery_limit overrides (takes effect)
> > > perquery_perseg_limit only when it’s value is less than
> > > (perquery_perseg_limit * segment host count).
> > >
> > > Is that the correct assumption? That would make sense, as users may
> want
> > to
> > > keep a check on how much processing a single query can take up (that
> > > implies that the limit must be lower than the total possible v-segs).
> Or,
> > > it may make sense in large clusters (100-nodes or more) where we need
> to
> > > limit the pressure on HDFS.
> > >
> > >
> > > 2) Now, if the purpose of hawq_rm_nvseg_perquery_limit is to keep a
> check
> > > on single query resource usage (by limiting the # of v-segs), doesn’t
> if
> > > affect default_hash_table_bucket_number because queries will fail when
> > > *default_hash_table_bucket_number* is greater than
> > > hawq_rm_nvseg_perquery_limit ? In that case, the purpose of
> > > hawq_rm_nvseg_perquery_limit conflicts with the ability to run queries
> on
> > > HASH dist tables. This then means that tuning
> > hawq_rm_nvseg_perquery_limit
> > > down is not a good idea, which seems conflicting to the purpose of the
> > GUC
> > > (in relation to other GUC).
> > >
> > >
> > > Perhaps someone can provide some examples on *how and when would you
> > > tune hawq_rm_nvseg_perquery_limit* in this 10-node example:
> > >
> > > *Defaults on a 10-node cluster are:*
> > > a) *hawq_rm_nvseg_perquery_perseg_limit* = 6 (hence ability to 

Re: Rename "greenplum" to "hawq"

2016-07-13 Thread Hubert Zhang
+1 This is a legacy problem. Clear GUCs including "GP" may be the first
step.

On Wed, Jul 13, 2016 at 2:07 PM, Wen Lin  wrote:

> It's definitely a good thing, but needs very careful thought on effects to
> customers.
> We can try to list these effects as much as possile.
>
> On Wed, Jul 13, 2016 at 1:55 PM, Yi Jin  wrote:
>
> > I think it is a must-do, but some concerns of customer using convention
> and
> > legacy applications, scripts etc.
> >
> > On Wed, Jul 13, 2016 at 1:44 PM, 陶征霖  wrote:
> >
> > > Good idea, but need quite a lot of effort and may also affect custormer
> > > behavior. Should handle it carefully.
> > >
> > > 2016-07-13 9:54 GMT+08:00 Ivan Weng :
> > >
> > > > Agree with this good idea. But as Paul said, there are maybe already
> > many
> > > > users use greeenplum_path.sh or something else in their environment.
> So
> > > we
> > > > need to think about it.
> > > >
> > > >
> > > > Regards,
> > > > Ivan
> > > >
> > > > On Wed, Jul 13, 2016 at 9:31 AM, Paul Guo  wrote:
> > > >
> > > > > I've asked this before. Seems that affects some old users. I'm not
> > sure
> > > > > about the details.
> > > > > I agree that we should change it to a better name in a release.
> > > > >
> > > > > 2016-07-13 9:25 GMT+08:00 Roman Shaposhnik :
> > > > >
> > > > > > On Tue, Jul 12, 2016 at 6:21 PM, Xiang Sheng 
> > > > wrote:
> > > > > > > Agree . @xunzhang.
> > > > > > > However , some greenplum strings can be easily replaced , but
> > there
> > > > are
> > > > > > too
> > > > > > > many in the code or comments.  Changing all of them costs too
> > much
> > > > > > efforts.
> > > > > > >
> > > > > > > So changing the strings that users can see is enough.
> > > > > >
> > > > > > Huge +1 to this! Btw, is this something we may be able to tackle
> in
> > > our
> > > > > > next Apache release?
> > > > > >
> > > > > > Thanks,
> > > > > > Roman.
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Zhenglin
> > >
> >
>



-- 
Thanks

Hubert Zhang


Re: Question on hawq_rm_nvseg_perquery_limit

2016-07-13 Thread Hubert Zhang
+1 with Yi's answer.
Vseg numbers are controlled by Resource Negotiator(a module before
planner),  all the vseg related GUCs will affect the behaviour of RN, some
of them will also affect Resource Manager.
To be specific, hawq_rm_nvseg_perquery_limit and
hawq_rm_nvseg_perquery_perseg_limit
are both considered by Resource Negotiator(RN) and Resource Manager(RM),
while default_hash_table_bucket_number is only considered by RN.
As a result, suppose default_hash_table_bucket_number  = 60, query like
"select * from hash_table" will request #60 vsegs in RN and if
hawq_rm_nvseg_perquery_limit
is less than 60, RM will not able to allocate 60 vsegs.

So we need to ensure default_hash_table_bucket_number is less than the
capacity of RM.

On Wed, Jul 13, 2016 at 1:40 PM, Yi Jin  wrote:

> Hi Vineet,
>
> Some my comment.
>
> For question 1.
> Yes,
> perquery_limit is introduced mainly for restrict resource usage in large
> scale cluster; perquery_perseg_limit is to avoid allocating too many
> processes in one segment, which may cause serious performance issue. So,
> two gucs are for different performance aspects. Along with the variation of
> cluster scale, one of the two limits actually takes effect. We dont have to
> let both active for resource allocation.
>
> For question 2.
>
> In fact, perquery_perseg_limit is a general resource restriction for all
> queries not only hash table queries and external table queries, this is why
> this guc is not merged with another one. For example, when we run some
> queries upon random distributed tables, it does not make sense to let
> resource manager refer a guc for hash table.
>
> For the last topic item.
>
> In my opinion, it is not necessary to adjust hawq_rm_nvseg_perquery_limit,
> say, we just need to leave it unchanged and actually not active until we
> really want to run a large-scale HAWQ cluster, for example, 100+ nodes.
>
> Best,
> Yi
>
> On Wed, Jul 13, 2016 at 1:18 PM, Vineet Goel  wrote:
>
> > Hi all,
> >
> > I’m trying to document some GUC usage in detail and have questions on
> > hawq_rm_nvseg_perquery_limit and hawq_rm_nvseg_perquery_perseg_limit
> > tuning.
> >
> > *hawq_rm_nvseg_perquery_limit* = (default value = 512) . Let’s call it
> > *perquery_limit* in short.
> > *hawq_rm_nvseg_perquery_perseg_limit* (default value = 6) . Let’s call it
> > *perquery_perseg_limit* in short.
> >
> >
> > 1) Is there ever any benefit in having perquery_limit *greater than*
> > (perquery_perseg_limit * segment host count) ?
> > For example in a 10-node cluster, HAWQ will never allocate more than (GUC
> > default 6 * 10 =) 60 v-segs, so the perquery_limit default of 512 doesn’t
> > have any effect. It seems perquery_limit overrides (takes effect)
> > perquery_perseg_limit only when it’s value is less than
> > (perquery_perseg_limit * segment host count).
> >
> > Is that the correct assumption? That would make sense, as users may want
> to
> > keep a check on how much processing a single query can take up (that
> > implies that the limit must be lower than the total possible v-segs). Or,
> > it may make sense in large clusters (100-nodes or more) where we need to
> > limit the pressure on HDFS.
> >
> >
> > 2) Now, if the purpose of hawq_rm_nvseg_perquery_limit is to keep a check
> > on single query resource usage (by limiting the # of v-segs), doesn’t if
> > affect default_hash_table_bucket_number because queries will fail when
> > *default_hash_table_bucket_number* is greater than
> > hawq_rm_nvseg_perquery_limit ? In that case, the purpose of
> > hawq_rm_nvseg_perquery_limit conflicts with the ability to run queries on
> > HASH dist tables. This then means that tuning
> hawq_rm_nvseg_perquery_limit
> > down is not a good idea, which seems conflicting to the purpose of the
> GUC
> > (in relation to other GUC).
> >
> >
> > Perhaps someone can provide some examples on *how and when would you
> > tune hawq_rm_nvseg_perquery_limit* in this 10-node example:
> >
> > *Defaults on a 10-node cluster are:*
> > a) *hawq_rm_nvseg_perquery_perseg_limit* = 6 (hence ability to spin up 6
> *
> > 10 = 60 total v-segs for random tables)
> > b) *hawq_rm_nvseg_perquery_limit* = 512 (but HAWQ will never dispatch
> more
> > than 60 v-segs on random table, so value of 512 does not seem practical)
> > c) *default_hash_table_bucket_number* = 60 (6 * 10)
> >
> >
> >
> > Thanks
> > Vineet
> >
>



-- 
Thanks

Hubert Zhang


Re: Question on hawq_rm_nvseg_perquery_limit

2016-07-13 Thread Jiali Yao
+1 for detail explanation.
One more, normally we do not suggest that default_hash_table_bucket_number
is greater than hawq_rm_nvseg_perquery_limit(512).
When initing large cluster, the default_hash_table_bucket_number will be
adjusted accordingly.  If default_hash_table_bucket_number >
hawq_rm_nvseg_perquery_limit, it will be adjusted to (
hawq_rm_nvseg_perquery_limit / hostnumber ) * hostnumber.
If the cluster is expanded, it should also need to be set properly.

Jiali


On Wed, Jul 13, 2016 at 1:40 PM, Yi Jin  wrote:

> Hi Vineet,
>
> Some my comment.
>
> For question 1.
> Yes,
> perquery_limit is introduced mainly for restrict resource usage in large
> scale cluster; perquery_perseg_limit is to avoid allocating too many
> processes in one segment, which may cause serious performance issue. So,
> two gucs are for different performance aspects. Along with the variation of
> cluster scale, one of the two limits actually takes effect. We dont have to
> let both active for resource allocation.
>
> For question 2.
>
> In fact, perquery_perseg_limit is a general resource restriction for all
> queries not only hash table queries and external table queries, this is why
> this guc is not merged with another one. For example, when we run some
> queries upon random distributed tables, it does not make sense to let
> resource manager refer a guc for hash table.
>
> For the last topic item.
>
> In my opinion, it is not necessary to adjust hawq_rm_nvseg_perquery_limit,
> say, we just need to leave it unchanged and actually not active until we
> really want to run a large-scale HAWQ cluster, for example, 100+ nodes.
>
> Best,
> Yi
>
> On Wed, Jul 13, 2016 at 1:18 PM, Vineet Goel  wrote:
>
> > Hi all,
> >
> > I’m trying to document some GUC usage in detail and have questions on
> > hawq_rm_nvseg_perquery_limit and hawq_rm_nvseg_perquery_perseg_limit
> > tuning.
> >
> > *hawq_rm_nvseg_perquery_limit* = (default value = 512) . Let’s call it
> > *perquery_limit* in short.
> > *hawq_rm_nvseg_perquery_perseg_limit* (default value = 6) . Let’s call it
> > *perquery_perseg_limit* in short.
> >
> >
> > 1) Is there ever any benefit in having perquery_limit *greater than*
> > (perquery_perseg_limit * segment host count) ?
> > For example in a 10-node cluster, HAWQ will never allocate more than (GUC
> > default 6 * 10 =) 60 v-segs, so the perquery_limit default of 512 doesn’t
> > have any effect. It seems perquery_limit overrides (takes effect)
> > perquery_perseg_limit only when it’s value is less than
> > (perquery_perseg_limit * segment host count).
> >
> > Is that the correct assumption? That would make sense, as users may want
> to
> > keep a check on how much processing a single query can take up (that
> > implies that the limit must be lower than the total possible v-segs). Or,
> > it may make sense in large clusters (100-nodes or more) where we need to
> > limit the pressure on HDFS.
> >
> >
> > 2) Now, if the purpose of hawq_rm_nvseg_perquery_limit is to keep a check
> > on single query resource usage (by limiting the # of v-segs), doesn’t if
> > affect default_hash_table_bucket_number because queries will fail when
> > *default_hash_table_bucket_number* is greater than
> > hawq_rm_nvseg_perquery_limit ? In that case, the purpose of
> > hawq_rm_nvseg_perquery_limit conflicts with the ability to run queries on
> > HASH dist tables. This then means that tuning
> hawq_rm_nvseg_perquery_limit
> > down is not a good idea, which seems conflicting to the purpose of the
> GUC
> > (in relation to other GUC).
> >
> >
> > Perhaps someone can provide some examples on *how and when would you
> > tune hawq_rm_nvseg_perquery_limit* in this 10-node example:
> >
> > *Defaults on a 10-node cluster are:*
> > a) *hawq_rm_nvseg_perquery_perseg_limit* = 6 (hence ability to spin up 6
> *
> > 10 = 60 total v-segs for random tables)
> > b) *hawq_rm_nvseg_perquery_limit* = 512 (but HAWQ will never dispatch
> more
> > than 60 v-segs on random table, so value of 512 does not seem practical)
> > c) *default_hash_table_bucket_number* = 60 (6 * 10)
> >
> >
> >
> > Thanks
> > Vineet
> >
>


Re: Rename "greenplum" to "hawq"

2016-07-13 Thread Hong Wu
Thank you for comments.

I think LILI's suggestion is like a workaround or a temporary version which
is not that neat.

>From user's perspective, we should list all usage changes for them if we
rename, then have a further discussion/decision. Also, I think they should
be evolving too. Take a little effect to have great upgrade is worthy. Of
course, we should give them a buffer to transfer.

Best
xunzhang

2016-07-13 14:01 GMT+08:00 Lili Ma :

> For the command line tools name, what about we keep greenplum_path.sh and
> add a hawq_env.sh and inside hawq_env.sh, we call greenplum_path.sh. So the
> already user won't need to change their behavior, and new user can directly
> use the new hawq named script?
>
> For the environment variable, for example GPHOME,can we create a new
> variable HAWQHOME, and set it the same value as GPHOME?
>
>
> Thanks
> Lili
>
> 2016-07-13 13:55 GMT+08:00 Yi Jin :
>
> > I think it is a must-do, but some concerns of customer using convention
> and
> > legacy applications, scripts etc.
> >
> > On Wed, Jul 13, 2016 at 1:44 PM, 陶征霖  wrote:
> >
> > > Good idea, but need quite a lot of effort and may also affect custormer
> > > behavior. Should handle it carefully.
> > >
> > > 2016-07-13 9:54 GMT+08:00 Ivan Weng :
> > >
> > > > Agree with this good idea. But as Paul said, there are maybe already
> > many
> > > > users use greeenplum_path.sh or something else in their environment.
> So
> > > we
> > > > need to think about it.
> > > >
> > > >
> > > > Regards,
> > > > Ivan
> > > >
> > > > On Wed, Jul 13, 2016 at 9:31 AM, Paul Guo  wrote:
> > > >
> > > > > I've asked this before. Seems that affects some old users. I'm not
> > sure
> > > > > about the details.
> > > > > I agree that we should change it to a better name in a release.
> > > > >
> > > > > 2016-07-13 9:25 GMT+08:00 Roman Shaposhnik :
> > > > >
> > > > > > On Tue, Jul 12, 2016 at 6:21 PM, Xiang Sheng 
> > > > wrote:
> > > > > > > Agree . @xunzhang.
> > > > > > > However , some greenplum strings can be easily replaced , but
> > there
> > > > are
> > > > > > too
> > > > > > > many in the code or comments.  Changing all of them costs too
> > much
> > > > > > efforts.
> > > > > > >
> > > > > > > So changing the strings that users can see is enough.
> > > > > >
> > > > > > Huge +1 to this! Btw, is this something we may be able to tackle
> in
> > > our
> > > > > > next Apache release?
> > > > > >
> > > > > > Thanks,
> > > > > > Roman.
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Zhenglin
> > >
> >
>


Re: Rename "greenplum" to "hawq"

2016-07-13 Thread Jiali Yao
I think it would be more confusing since there are two files and
environment variables have same purpose.
I think it is worth to investigate the impact from code and user and to
replace greenplum.

Jiali


On Wed, Jul 13, 2016 at 2:01 PM, Lili Ma  wrote:

> For the command line tools name, what about we keep greenplum_path.sh and
> add a hawq_env.sh and inside hawq_env.sh, we call greenplum_path.sh. So the
> already user won't need to change their behavior, and new user can directly
> use the new hawq named script?
>
> For the environment variable, for example GPHOME,can we create a new
> variable HAWQHOME, and set it the same value as GPHOME?
>
>
> Thanks
> Lili
>
> 2016-07-13 13:55 GMT+08:00 Yi Jin :
>
> > I think it is a must-do, but some concerns of customer using convention
> and
> > legacy applications, scripts etc.
> >
> > On Wed, Jul 13, 2016 at 1:44 PM, 陶征霖  wrote:
> >
> > > Good idea, but need quite a lot of effort and may also affect custormer
> > > behavior. Should handle it carefully.
> > >
> > > 2016-07-13 9:54 GMT+08:00 Ivan Weng :
> > >
> > > > Agree with this good idea. But as Paul said, there are maybe already
> > many
> > > > users use greeenplum_path.sh or something else in their environment.
> So
> > > we
> > > > need to think about it.
> > > >
> > > >
> > > > Regards,
> > > > Ivan
> > > >
> > > > On Wed, Jul 13, 2016 at 9:31 AM, Paul Guo  wrote:
> > > >
> > > > > I've asked this before. Seems that affects some old users. I'm not
> > sure
> > > > > about the details.
> > > > > I agree that we should change it to a better name in a release.
> > > > >
> > > > > 2016-07-13 9:25 GMT+08:00 Roman Shaposhnik :
> > > > >
> > > > > > On Tue, Jul 12, 2016 at 6:21 PM, Xiang Sheng 
> > > > wrote:
> > > > > > > Agree . @xunzhang.
> > > > > > > However , some greenplum strings can be easily replaced , but
> > there
> > > > are
> > > > > > too
> > > > > > > many in the code or comments.  Changing all of them costs too
> > much
> > > > > > efforts.
> > > > > > >
> > > > > > > So changing the strings that users can see is enough.
> > > > > >
> > > > > > Huge +1 to this! Btw, is this something we may be able to tackle
> in
> > > our
> > > > > > next Apache release?
> > > > > >
> > > > > > Thanks,
> > > > > > Roman.
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Zhenglin
> > >
> >
>


Re: Rename "greenplum" to "hawq"

2016-07-13 Thread Lili Ma
For the command line tools name, what about we keep greenplum_path.sh and
add a hawq_env.sh and inside hawq_env.sh, we call greenplum_path.sh. So the
already user won't need to change their behavior, and new user can directly
use the new hawq named script?

For the environment variable, for example GPHOME,can we create a new
variable HAWQHOME, and set it the same value as GPHOME?


Thanks
Lili

2016-07-13 13:55 GMT+08:00 Yi Jin :

> I think it is a must-do, but some concerns of customer using convention and
> legacy applications, scripts etc.
>
> On Wed, Jul 13, 2016 at 1:44 PM, 陶征霖  wrote:
>
> > Good idea, but need quite a lot of effort and may also affect custormer
> > behavior. Should handle it carefully.
> >
> > 2016-07-13 9:54 GMT+08:00 Ivan Weng :
> >
> > > Agree with this good idea. But as Paul said, there are maybe already
> many
> > > users use greeenplum_path.sh or something else in their environment. So
> > we
> > > need to think about it.
> > >
> > >
> > > Regards,
> > > Ivan
> > >
> > > On Wed, Jul 13, 2016 at 9:31 AM, Paul Guo  wrote:
> > >
> > > > I've asked this before. Seems that affects some old users. I'm not
> sure
> > > > about the details.
> > > > I agree that we should change it to a better name in a release.
> > > >
> > > > 2016-07-13 9:25 GMT+08:00 Roman Shaposhnik :
> > > >
> > > > > On Tue, Jul 12, 2016 at 6:21 PM, Xiang Sheng 
> > > wrote:
> > > > > > Agree . @xunzhang.
> > > > > > However , some greenplum strings can be easily replaced , but
> there
> > > are
> > > > > too
> > > > > > many in the code or comments.  Changing all of them costs too
> much
> > > > > efforts.
> > > > > >
> > > > > > So changing the strings that users can see is enough.
> > > > >
> > > > > Huge +1 to this! Btw, is this something we may be able to tackle in
> > our
> > > > > next Apache release?
> > > > >
> > > > > Thanks,
> > > > > Roman.
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Zhenglin
> >
>