[GitHub] incubator-hawq pull request: HAWQ-199. Add license header for PXF ...
Github user yaoj2 commented on the pull request: https://github.com/apache/incubator-hawq/pull/141#issuecomment-160497639 Looks good --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: December 2015 Report
Hi Folks, The following is the Dec draft report. comments appreciated! -- Apache HAWQ is a Hadoop native SQL query engine that combines the key technological advantages of MPP database with the scalability and convenience of Hadoop. HAWQ reads data from and writes data to HDFS natively. HAWQ delivers industry-leading performance and linear scalability. It provides users the tools to confidently and successfully interact with petabyte range data sets. HAWQ provides users with a complete, standards compliant SQL interface. HAWQ has been incubating since 2015-09-04. Three most important issues to address in the move towards graduation: 1. Produce our first Apache Release 2. Expand the community, increase dev list activity and adding new contributors 3. Infrastructure migration: create Jenkins projects that build HAWQ binary, source tarballs and docker images, and run feature tests including at least installcheck-good tests for each commit (HAWQ-127). Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware of? We have only just started the incubation, everything seems to be smooth, nothing urgent at this time. How has the community developed since the last report? 1. All core contributors/committers have started working on Apache repository 2. One HAWQ Nest community meeting (topic: HAWQ architecture introduction) 3. Both user & dev community shows increasing interests. In Nov, there have been 438 messages on dev@ and 75 on user@, compared with 357 messages on dev@ and 21 messages on user@ in Oct. How has the project developed since the last report? 1. Main features/improvements added include: a) Support HA for libyarn [HAWQ-38] b) Dynamic statement level resource usage [HAWQ-47] c) Support Kerberos for libyarn [HAWQ-51] d) Dependent component version upgrade & Bug fixes & documentation improvement 2. First release has been proposed and most JIRAs have been finished (115 issues resolved in the release). ( https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318826=12334000 ) Left thing is to do the PPMC & IPMC voting) 3. 85 new JIRAs filed, 74 resolved (In Nov 2015) 4. 76 code commits (In Nov 2015) Date of last release: We have not had a release yet. When were the last committers or PMC members elected? No new committers/members from initial. Cheers Lei On Fri, Nov 27, 2015 at 4:40 AM, Marvin Humphreywrote: > > Greetings, {podling} developers! > > This is a reminder that your report is due next Wednesday, December > 2nd. Details below. > > Best, > > Marvin Humphrey, Report Manager for December, on behalf of the > Incubator PMC > > --- > > Dear podling, > > This email was sent by an automated system on behalf of the Apache > Incubator PMC. It is an initial reminder to give you plenty of time to > prepare your quarterly board report. > > The board meeting is scheduled for Wed, 16 December 2015, 10:30 am PST. > The report for your podling will form a part of the Incubator PMC > report. The Incubator PMC requires your report to be submitted 2 weeks > before the board meeting, to allow sufficient time for review and > submission (Wed, December 2nd). > > Please submit your report with sufficient time to allow the Incubator > PMC, and subsequently board members to review and digest. Again, the > very latest you should submit your report is 2 weeks prior to the board > meeting. > > Thanks, > > The Apache Incubator PMC > > Submitting your Report > > -- > > Your report should contain the following: > > * Your project name > * A brief description of your project, which assumes no knowledge of > the project or necessarily of its field > * A list of the three most important issues to address in the move > towards graduation. > * Any issues that the Incubator PMC or ASF Board might wish/need to be > aware of > * How has the community developed since the last report > * How has the project developed since the last report. > > This should be appended to the Incubator Wiki page at: > > http://wiki.apache.org/incubator/December2015 > > Note: This is manually populated. You may need to wait a little before > this page is created from a template. > > Mentors > --- > > Mentors should review reports for their project(s) and sign them off on > the Incubator wiki page. Signing off reports shows that you are > following the project - projects that are not signed may raise alarms > for the Incubator PMC. > > Incubator PMC > >
license headers for hawq
Hi, HAWQ Mentors, I have added license headers to apache-hawq, here is the rules I follow. 1. will not add license headers to source files from or modified from Postgres. 2. will not add license headers 3rd-party tools/libraries, like contrib/orafce, pgcrypto, pljava, etc. 3. add license headers to files from greenplum, or created by hawq(use apache-rat to add license header for source files). * Summary --- Generated at: 2015-11-30T10:40:35+08:00 Notes: 79 Binaries: 796 Archives: 20 Standards: 5870 Apache Licensed: 1366 Generated Documents: 0 JavaDocs are generated and so license header is optional Generated files do not required license headers 4488 Unknown Licenses *** Please help me verify it. Thanks,
[GitHub] incubator-hawq pull request: HAWQ-196: Refine GUCs in hawq-site.xm...
Github user huor commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/138#discussion_r46112116 --- Diff: src/backend/utils/misc/etc/template-hawq-site.xml --- @@ -20,131 +20,133 @@ under the License. --> - -hawq_master_address_host -%master.host% - - - -hawq_master_address_port -%master.port% - - - -hawq_standby_address_host -%standby.host% - - - -hawq_segment_address_port -%segment.port% - - - -hawq_dfs_url -%namenode.host%:%namenode.port%/%hawq.file.space% - - - -hawq_master_directory -%master.directory% - - - -hawq_segment_directory -%segment.directory% - - - -hawq_master_temp_directory -%master.temp.directory% - - - -hawq_segment_temp_directory -%segment.temp.directory% - + + hawq_master_address_host + %master.host% + The host name of hawq master. + + + + hawq_master_address_port + %master.port% + The port of hawq master. + + + + hawq_standby_address_host + %standby.host% + The host name of hawq standby master. + + + + hawq_segment_address_port + %segment.port% + The port of hawq segment. + + + + hawq_dfs_url + %namenode.host%:%namenode.port%/%hawq.file.space% + URL for accessing HDFS. + + + + hawq_master_directory + %master.directory% + The directory of hawq master. + + + + hawq_segment_directory + %segment.directory% + The directory of hawq segment. + + + + hawq_master_temp_directory + %master.temp.directory% + The temporary directory reserved for hawq master. + + + + hawq_segment_temp_directory + %segment.temp.directory% + The temporary directory reserved for hawq segment. + - - -hawq_rm_yarn_address -%master.host%:9980 --- End diff -- To make the ports consistent in hawq-site.xml, template-hawq-site.xml and yarn-client.xml, will use below settings: hawq_rm_yarn_address: %master.host%:8032 hawq_rm_yarn_scheduler_address: %master.host%:8030 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hawq pull request: HAWQ-199. Add license header for PXF ...
Github user linwen closed the pull request at: https://github.com/apache/incubator-hawq/pull/141 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Performance issue about HAWQ 2.0 beta
Hi, Martin Visser Thanks for you quick reply. I attached the "explain analyze" in my last email of this thread. And because hawq-2.0 introduce the "virtual segment", and we configure 8 virtual-segment for each node. So, we can see different segment numbers. On Fri, Nov 27, 2015 at 4:58 PM, Martin Visserwrote: > Hi Leon, > > looking at the 2.0 plan, you're perhaps missing stats on some of the tables > for example: > -> Parquet table Scan on catalog_sales (cost=0.00..23885.35 rows=1 > width=197) > -> Parquet table Scan on web_sales (cost=0.00..11982.30 rows=1 width=197) > > Can you check or run explain analyze? Also number of segments is showing > different numbers 1.3 5 segs and 2.0 40 sets > > On Fri, Nov 27, 2015 at 7:43 AM, Leon Zhang wrote: > > > Hi, HAWQ Developers: > > > >As my previous email hint, I run TPC-DS test on our development. > > Comparing with previous version 1.3.x, we can see the performance > > improvement on most of queries. > > > >But the problem is performance reduction for *some* queries. For > > example, the query64, the running time increase from 10754.688 ms > > to 68884.731 ms . I am not sure if any changes were made that increase > the > > running time? > > > >In order to discuss the detail about this issue, I would like use the > > query10. The running time increase from 1795.746 ms to 744919.251 ms. I > > also attache the sql about this query, and the query plan for this query. > > > >Thanks > > > > >
Re: Performance issue about HAWQ 2.0 beta
Hi, Jiali Yao, Thanks for you reply. Here is the detail information: 1. the segment configrations: # select * from gp_segment_configuration ; registration_order | role | status | port | hostname | address +--++---+--+ 0 | m| u | 25432 | dserver1 | dserver1 1 | p| u | 40404 | dserver5 | 10.10.0.15 2 | p| u | 40404 | dserver3 | 10.10.0.13 3 | p| u | 40404 | dserver1 | 10.10.0.11 4 | p| u | 40404 | dserver4 | 10.10.0.14 5 | p| u | 40404 | dserver2 | 10.10.0.12 (6 rows) 2. The "explain analyze" about the query, see the attachement. 3. No, this query was tested *without YARN*. Thanks On Fri, Nov 27, 2015 at 4:59 PM, Jiali Yaowrote: > Hi Leon > > Thanks for providing it. The result is not as we expected. In our > performance test, we found the performance is comparable with 1.3. > Could you please some more information: > 1. Get segment configuration information from 1.3 and 2.0 > select * from gp_segment_configuration ; > 2. Could you please run "explain analyze" to get more statistic > information? > 3. Want to confirm with you: The result run in yarn mode ,right? Also I see > your previous email to indicate there is some error in yarn, these query is > also from that test round, right? > > Thanks > > Jiali > > On Fri, Nov 27, 2015 at 3:43 PM, Leon Zhang wrote: > > > Hi, HAWQ Developers: > > > >As my previous email hint, I run TPC-DS test on our development. > > Comparing with previous version 1.3.x, we can see the performance > > improvement on most of queries. > > > >But the problem is performance reduction for *some* queries. For > > example, the query64, the running time increase from 10754.688 ms > > to 68884.731 ms . I am not sure if any changes were made that increase > the > > running time? > > > >In order to discuss the detail about this issue, I would like use the > > query10. The running time increase from 1795.746 ms to 744919.251 ms. I > > also attache the sql about this query, and the query plan for this query. > > > >Thanks > > > > > Pager usage is off. Timing is on. QUERY PLAN Limit (cost=750627231.96..750627235.38 rows=37 width=208) Rows out: 5 rows with 1038177 ms to end, start offset by 251/251 ms. -> Gather Motion 40:1 (slice10; segments: 40) (cost=750627231.96..750627235.38 rows=37 width=208) Merge Key: partial_aggregation.cd_gender, partial_aggregation.cd_marital_status, partial_aggregation.cd_education_status, partial_aggregation.cd_purchase_estimate, partial_aggregation.cd_credit_rating, partial_aggregation.cd_dep_count, partial_aggregation.cd_dep_employed_count, partial_aggregation.cd_dep_college_count Rows out: 5 rows at destination with 1038177 ms to end, start offset by 251/251 ms. -> Limit (cost=750627231.96..750627234.64 rows=1 width=208) Rows out: Avg 1.0 rows x 5 workers. Max/Last(seg13:dserver2/seg6:dserver1) 1/0 rows with 1038097/1038170 ms to end, start offset by 331/258 ms. -> GroupAggregate (cost=750627231.96..750627234.64 rows=1 width=208) Group By: customer_demographics.cd_gender, customer_demographics.cd_marital_status, customer_demographics.cd_education_status, customer_demographics.cd_purchase_estimate, customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, customer_demographics.cd_dep_employed_count, customer_demographics.cd_dep_college_count Rows out: Avg 1.0 rows x 5 workers. Max/Last(seg13:dserver2/seg6:dserver1) 1/0 rows with 1038097/1038170 ms to end, start offset by 331/258 ms. -> Sort (cost=750627231.96..750627232.05 rows=1 width=168) Sort Key: customer_demographics.cd_gender, customer_demographics.cd_marital_status, customer_demographics.cd_education_status, customer_demographics.cd_purchase_estimate, customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, customer_demographics.cd_dep_employed_count,