[GitHub] incubator-hawq pull request: HAWQ-199. Add license header for PXF ...

2015-11-29 Thread yaoj2
Github user yaoj2 commented on the pull request:

https://github.com/apache/incubator-hawq/pull/141#issuecomment-160497639
  
Looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: December 2015 Report

2015-11-29 Thread Lei Chang
Hi Folks,

The following is the Dec draft report. comments appreciated!

--
Apache HAWQ is a Hadoop native SQL query engine that combines the key
technological advantages of MPP database
with the scalability and convenience of Hadoop. HAWQ reads data from and
writes data to HDFS natively.
HAWQ delivers industry-leading performance and linear scalability. It
provides users the tools to
confidently and successfully interact with petabyte range data sets. HAWQ
provides users with a complete,
standards compliant SQL interface.

HAWQ has been incubating since 2015-09-04.

Three most important issues to address in the move towards graduation:

  1. Produce our first Apache Release
  2. Expand the community, increase dev list activity and adding new
contributors
  3. Infrastructure migration: create Jenkins projects that build HAWQ
 binary, source tarballs and docker images, and run feature tests
 including at least installcheck-good tests for each commit (HAWQ-127).

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

  We have only just started the incubation, everything seems to be smooth,
  nothing urgent at this time.

How has the community developed since the last report?

  1. All core contributors/committers have started working on Apache
repository
  2. One HAWQ Nest community meeting (topic: HAWQ architecture
introduction)
  3. Both user & dev community shows increasing interests. In Nov, there
have been 438 messages
  on dev@ and 75 on user@, compared with 357 messages on dev@ and 21
messages on user@ in Oct.


How has the project developed since the last report?

  1. Main features/improvements added include:
 a) Support HA for libyarn [HAWQ-38]
 b) Dynamic statement level resource usage [HAWQ-47]
 c) Support Kerberos for libyarn [HAWQ-51]
 d) Dependent component version upgrade & Bug fixes & documentation
improvement
  2. First release has been proposed and most JIRAs have been finished (115
issues resolved in the release).
 (
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318826=12334000
)
 Left thing is to do the PPMC & IPMC voting)
  3. 85 new JIRAs filed, 74 resolved (In Nov 2015)
  4. 76 code commits (In Nov 2015)

Date of last release:

  We have not had a release yet.

When were the last committers or PMC members elected?
  No new committers/members from initial.


Cheers
Lei


On Fri, Nov 27, 2015 at 4:40 AM, Marvin Humphrey  wrote:

>
> Greetings, {podling} developers!
>
> This is a reminder that your report is due next Wednesday, December
> 2nd.  Details below.
>
> Best,
>
> Marvin Humphrey, Report Manager for December, on behalf of the
> Incubator PMC
>
> ---
>
> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 16 December 2015, 10:30 am PST.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, December 2nd).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
>
> This should be appended to the Incubator Wiki page at:
>
> http://wiki.apache.org/incubator/December2015
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>
>


license headers for hawq

2015-11-29 Thread Wen Lin
Hi, HAWQ Mentors,

I have added license headers to apache-hawq, here is the rules I follow.
1. will not add license headers to source files from or modified from
Postgres.
2. will not add license headers 3rd-party tools/libraries, like
contrib/orafce, pgcrypto, pljava, etc.
3. add  license headers to files from greenplum, or created by hawq(use
apache-rat to add license header for source files).

*
Summary
---
Generated at: 2015-11-30T10:40:35+08:00
Notes: 79
Binaries: 796
Archives: 20
Standards: 5870

Apache Licensed: 1366
Generated Documents: 0

JavaDocs are generated and so license header is optional
Generated files do not required license headers

4488 Unknown Licenses

***

Please help me verify it.
Thanks,


[GitHub] incubator-hawq pull request: HAWQ-196: Refine GUCs in hawq-site.xm...

2015-11-29 Thread huor
Github user huor commented on a diff in the pull request:

https://github.com/apache/incubator-hawq/pull/138#discussion_r46112116
  
--- Diff: src/backend/utils/misc/etc/template-hawq-site.xml ---
@@ -20,131 +20,133 @@ under the License.
 -->
 
 
-
-hawq_master_address_host
-%master.host%
-
-
-
-hawq_master_address_port
-%master.port%
-
-
-
-hawq_standby_address_host
-%standby.host%
-
-
-
-hawq_segment_address_port
-%segment.port%
-
-
-
-hawq_dfs_url
-%namenode.host%:%namenode.port%/%hawq.file.space%
-
-
-
-hawq_master_directory
-%master.directory%
-
-
-
-hawq_segment_directory
-%segment.directory%
- 
-
-
-hawq_master_temp_directory
-%master.temp.directory%
-
-
-
-hawq_segment_temp_directory
-%segment.temp.directory%
-
+   
+   hawq_master_address_host
+   %master.host%
+   The host name of hawq master.
+   
+
+   
+   hawq_master_address_port
+   %master.port%
+   The port of hawq master.
+   
+
+   
+   hawq_standby_address_host
+   %standby.host%
+   The host name of hawq standby master.
+   
+
+   
+   hawq_segment_address_port
+   %segment.port%
+   The port of hawq segment.
+   
+
+   
+   hawq_dfs_url
+   %namenode.host%:%namenode.port%/%hawq.file.space%
+   URL for accessing HDFS.
+   
+
+   
+   hawq_master_directory
+   %master.directory%
+   The directory of hawq master.
+   
+
+   
+   hawq_segment_directory
+   %segment.directory%
+   The directory of hawq segment.
+
+
+   
+   hawq_master_temp_directory
+   %master.temp.directory%
+   The temporary directory reserved for hawq 
master.
+   
+
+   
+   hawq_segment_temp_directory
+   %segment.temp.directory%
+   The temporary directory reserved for hawq 
segment.
+   
  
-
-
-hawq_rm_yarn_address
-%master.host%:9980
--- End diff --

To make the ports consistent in hawq-site.xml, template-hawq-site.xml and 
yarn-client.xml, will use below settings:

hawq_rm_yarn_address: %master.host%:8032
hawq_rm_yarn_scheduler_address: %master.host%:8030


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request: HAWQ-199. Add license header for PXF ...

2015-11-29 Thread linwen
Github user linwen closed the pull request at:

https://github.com/apache/incubator-hawq/pull/141


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Performance issue about HAWQ 2.0 beta

2015-11-29 Thread Leon Zhang
Hi, Martin Visser

   Thanks for you quick reply.  I attached the "explain analyze" in my last
email of this thread.

  And because hawq-2.0 introduce the "virtual segment", and we configure 8
virtual-segment for each node. So, we can see different segment numbers.

On Fri, Nov 27, 2015 at 4:58 PM, Martin Visser  wrote:

> Hi Leon,
>
> looking at the 2.0 plan, you're perhaps missing stats on some of the tables
> for example:
> -> Parquet table Scan on catalog_sales  (cost=0.00..23885.35 rows=1
> width=197)
> -> Parquet table Scan on web_sales  (cost=0.00..11982.30 rows=1 width=197)
>
> Can you check or run explain analyze?  Also number of segments is showing
> different numbers 1.3 5 segs and 2.0 40 sets
>
> On Fri, Nov 27, 2015 at 7:43 AM, Leon Zhang  wrote:
>
> > Hi, HAWQ Developers:
> >
> >As my previous email hint, I run TPC-DS test on our development.
> > Comparing with previous version 1.3.x, we can see the performance
> > improvement on most of queries.
> >
> >But the problem is performance reduction for *some* queries. For
> > example, the query64, the running time increase from 10754.688 ms
> > to 68884.731 ms . I am not sure if any changes were made that increase
> the
> > running time?
> >
> >In order to discuss the detail about this issue, I would like use the
> > query10. The running time increase from 1795.746 ms to 744919.251 ms. I
> > also attache the sql about this query, and the query plan for this query.
> >
> >Thanks
> >
> >
>


Re: Performance issue about HAWQ 2.0 beta

2015-11-29 Thread Leon Zhang
Hi, Jiali Yao,

   Thanks for you reply.

   Here is the detail information:
   1. the segment configrations:
# select * from gp_segment_configuration ;
 registration_order | role | status | port  | hostname |  address
+--++---+--+
  0 | m| u  | 25432 | dserver1 | dserver1
  1 | p| u  | 40404 | dserver5 | 10.10.0.15
  2 | p| u  | 40404 | dserver3 | 10.10.0.13
  3 | p| u  | 40404 | dserver1 | 10.10.0.11
  4 | p| u  | 40404 | dserver4 | 10.10.0.14
  5 | p| u  | 40404 | dserver2 | 10.10.0.12
(6 rows)

2. The "explain analyze" about the query, see the attachement.

3. No, this query was tested *without YARN*.

Thanks


On Fri, Nov 27, 2015 at 4:59 PM, Jiali Yao  wrote:

> Hi Leon
>
> Thanks for providing it. The result is not as we expected. In our
> performance test, we found the performance is comparable with 1.3.
> Could you please some more information:
> 1. Get segment configuration information from 1.3 and 2.0
> select * from gp_segment_configuration ;
> 2. Could you please run "explain analyze" to get more statistic
> information?
> 3. Want to confirm with you: The result run in yarn mode ,right? Also I see
> your previous email to indicate there is some error in yarn, these query is
> also from that test round, right?
>
> Thanks
>
> Jiali
>
> On Fri, Nov 27, 2015 at 3:43 PM, Leon Zhang  wrote:
>
> > Hi, HAWQ Developers:
> >
> >As my previous email hint, I run TPC-DS test on our development.
> > Comparing with previous version 1.3.x, we can see the performance
> > improvement on most of queries.
> >
> >But the problem is performance reduction for *some* queries. For
> > example, the query64, the running time increase from 10754.688 ms
> > to 68884.731 ms . I am not sure if any changes were made that increase
> the
> > running time?
> >
> >In order to discuss the detail about this issue, I would like use the
> > query10. The running time increase from 1795.746 ms to 744919.251 ms. I
> > also attache the sql about this query, and the query plan for this query.
> >
> >Thanks
> >
> >
>
Pager usage is off.
Timing is on.
 QUERY PLAN 

 Limit  (cost=750627231.96..750627235.38 rows=37 width=208)
   Rows out:  5 rows with 1038177 ms to end, start offset by 251/251 ms.
   ->  Gather Motion 40:1  (slice10; segments: 40)  (cost=750627231.96..750627235.38 rows=37 width=208)
 Merge Key: partial_aggregation.cd_gender, partial_aggregation.cd_marital_status, partial_aggregation.cd_education_status, partial_aggregation.cd_purchase_estimate, partial_aggregation.cd_credit_rating, partial_aggregation.cd_dep_count, partial_aggregation.cd_dep_employed_count, partial_aggregation.cd_dep_college_count
 Rows out:  5 rows at destination with 1038177 ms to end, start offset by 251/251 ms.
 ->  Limit  (cost=750627231.96..750627234.64 rows=1 width=208)
   Rows out:  Avg 1.0 rows x 5 workers.  Max/Last(seg13:dserver2/seg6:dserver1) 1/0 rows with 1038097/1038170 ms to end, start offset by 331/258 ms.
   ->  GroupAggregate  (cost=750627231.96..750627234.64 rows=1 width=208)
 Group By: customer_demographics.cd_gender, customer_demographics.cd_marital_status, customer_demographics.cd_education_status, customer_demographics.cd_purchase_estimate, customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, customer_demographics.cd_dep_employed_count, customer_demographics.cd_dep_college_count
 Rows out:  Avg 1.0 rows x 5 workers.  Max/Last(seg13:dserver2/seg6:dserver1) 1/0 rows with 1038097/1038170 ms to end, start offset by 331/258 ms.
 ->  Sort  (cost=750627231.96..750627232.05 rows=1 width=168)
   Sort Key: customer_demographics.cd_gender, customer_demographics.cd_marital_status, customer_demographics.cd_education_status, customer_demographics.cd_purchase_estimate, customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, customer_demographics.cd_dep_employed_count,