[jira] [Commented] (HAWQ-1530) Illegally killing a JDBC select query causes locking problems
[ https://issues.apache.org/jira/browse/HAWQ-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394885#comment-16394885 ] Shubham Sharma commented on HAWQ-1530: -- This issue is reproducible with 2.3.0.0 when multiple connections are opened using pgAdmin3(JDBC client). Also, another important part of reproducing this issue is that the communication between pgAdmin3 and hawq must be over network. If hawq and pgAdmin3 are present on same host the issue will not reproduce. > Illegally killing a JDBC select query causes locking problems > - > > Key: HAWQ-1530 > URL: https://issues.apache.org/jira/browse/HAWQ-1530 > Project: Apache HAWQ > Issue Type: Bug > Components: Transaction >Reporter: Grant Krieger >Assignee: Yi Jin >Priority: Major > Fix For: 2.3.0.0-incubating > > > Hi, > When you perform a long running select statement on 2 hawq tables (join) from > JDBC and illegally kill the JDBC client (CTRL ALT DEL) before completion of > the query the 2 tables remained locked even when the query completes on the > server. > The lock is visible via PG_locks. One cannot kill the query via SELECT > pg_terminate_backend(393937). The only way to get rid of it is to kill -9 > from linux or restart hawq but this can kill other things as well. > The JDBC client I am using is Aqua Data Studio. > I can provide exact steps to reproduce if required > Thank you > Grant -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HAWQ-1572) Travis CI build failure on master. Thrift/boost incompatibility
Shubham Sharma created HAWQ-1572: Summary: Travis CI build failure on master. Thrift/boost incompatibility Key: HAWQ-1572 URL: https://issues.apache.org/jira/browse/HAWQ-1572 Project: Apache HAWQ Issue Type: Bug Components: Build Reporter: Shubham Sharma Assignee: Radar Lei Hi, The travis CI build is failing for master and new commits. The CI is erroring out with {code} configure: error: thrift is required The command “./configure” failed and exited with 1 during . {code} I was able to reproduce this issue and looking at the config.log it looks like it is failing at the line below while running a conftest.cpp - {code} /usr/local/include/thrift/stdcxx.h:32:10: fatal error: 'boost/tr1/functional.hpp' file not found {code} The root cause of the problem is compatibility of thrift 0.11 with boost 1.65.1 . Travis recently upgraded there xcode to 9.2 and list of default packages now contains boost 1.65.1 and thrift 0.11. Thrift uses [stdcxx.h|https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/stdcxx.h] which includes boost/tr1/functional.hpp library. The support for tr1 has been removed in boost 1.65, see [here|http://www.boost.org/users/history/version_1_65_1.html] under topic “Removed Libraries”. Since tr1 library is no longer present in boost 1.65, this causes thrift to fail and eventually ./configure fails Solution As a solution I recommend that we uninstall boost 1.65 and install boost 1.60(the last compatible build with thrift). I am not sure if this is a problem with thrift that they are not yet compatible with boost 1.65 yet or a problem with travis ci that they have included two incompatible versions. Will love to hear community's thoughts on it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HAWQ-1562) Incorrect path to default log directory in documentation
Shubham Sharma created HAWQ-1562: Summary: Incorrect path to default log directory in documentation Key: HAWQ-1562 URL: https://issues.apache.org/jira/browse/HAWQ-1562 Project: Apache HAWQ Issue Type: Bug Components: Documentation Reporter: Shubham Sharma Assignee: David Yozie In the current documentation six files point to wrong location of default directories. The default log directory of management utilities is ~/hawqAdminLogs but the documentation specifies ~/hawq/Adminlogs/ . There list can be seen [here|https://github.com/apache/incubator-hawq-docs/search?utf8=%E2%9C%93=Adminlogs=] . -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HAWQ-1559) Travis CI failing for hawq after travis ci default image upgraded xcode to 8.3
Shubham Sharma created HAWQ-1559: Summary: Travis CI failing for hawq after travis ci default image upgraded xcode to 8.3 Key: HAWQ-1559 URL: https://issues.apache.org/jira/browse/HAWQ-1559 Project: Apache HAWQ Issue Type: Bug Components: Build Reporter: Shubham Sharma Assignee: Radar Lei It looks like our Travis build is broken. I first noticed this for my own fork's build and saw the same behavior in apache github repo as well. It is failing with the error below {code} configure: error: Please install apr from http://apr.apache.org/ and add dir of 'apr-1-config' to env variable '/Users/travis/.rvm/gems/ruby-2.4.2/bin:/Users/travis/.rvm/gems/ruby-2.4.2@global/bin:/Users/travis/.rvm/rubies/ruby-2.4.2/bin:/Users/travis/.rvm/bin:/Users/travis/bin:/Users/travis/.local/bin:/Users/travis/.nvm/versions/node/v6.11.4/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin'. The command "./configure" failed and exited with 1 during . Your build has been stopped. /Users/travis/.travis/job_stages: line 166: shell_session_update: command not found {code} Looked into it, the builds started failing November 28th. This is around the same time when Travis CI upgraded their default xcode version to 8.3. Here is the notification . Have identified a potential fix and tested it for my fork, the build completes successfully. Currently we don't install apr using brew install, which is one of the pre-requisites as mentioned in the hawq incubator wiki. The fix is to "brew install apr" and then force link it to the path using "brew link apr --force. This resolves the problem. But I have couple of additional questions - 1. How did the apr get installed before, was it installed with some other package. Asking this as few packages have been removed from the default image in xcode 8.3 2. Though the build for branches is failing continuously, why the build status for master is still green ? Anyhow, since apr is a dependency for our project my proposal is to add a brew install to travis.yml to avoid failure due to such upgrade in future. Let me know your thoughts, I have a PR ready. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HAWQ-1553) User who doesn't have home directory can not run hawq extract command
Shubham Sharma created HAWQ-1553: Summary: User who doesn't have home directory can not run hawq extract command Key: HAWQ-1553 URL: https://issues.apache.org/jira/browse/HAWQ-1553 Project: Apache HAWQ Issue Type: Bug Components: Command Line Tools Reporter: Shubham Sharma Assignee: Radar Lei HAWQ extract stores information in hawqextract_MMDD.log under directory ~/hawqAdminLogs, and a user who doesn't have it's own home directory encounters failure when running hawq extract. We can add -l option in order to set the target log directory for hawq extract. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1548) Ambiguous message while logging hawq utilization
[ https://issues.apache.org/jira/browse/HAWQ-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253870#comment-16253870 ] Shubham Sharma commented on HAWQ-1548: -- [~yjin] - thank you for the explanation. I will do some more research to better understand it. > Ambiguous message while logging hawq utilization > > > Key: HAWQ-1548 > URL: https://issues.apache.org/jira/browse/HAWQ-1548 > Project: Apache HAWQ > Issue Type: Improvement > Components: libyarn >Reporter: Shubham Sharma >Assignee: Lin Wen > > While YARN mode is enabled, resource broker logs two things - > - YARN cluster total resource > - HAWQ's total resource per node. > Following messages are logged > {code} > 2017-11-11 23:21:40.944904 > UTC,,,p549330,th9000778560,con4,,seg-1,"LOG","0","Resource > manager YARN resource broker counted YARN cluster having total resource > (1376256 MB, 168.00 CORE).",,,0,,"resourcebroker_LIBYARN.c",776, > 2017-11-11 23:21:40.944921 > UTC,,,p549330,th9000778560,con4,,seg-1,"LOG","0","Resource > manager YARN resource broker counted HAWQ cluster now having (98304 MB, > 12.00 CORE) in a YARN cluster of total resource (1376256 MB, 168.00 > CORE).",,,0,,"resourcebroker_LIBYARN.c",785, > {code} > The second message shown above is ambiguous, After reading the sentence below > it looks like that complete Hawq cluster in whole has only 98304 MB and 12 > cores. However according to the configuration it should be 98304 MB and 12 > cores per segment server. > {code} > Resource manager YARN resource broker counted HAWQ cluster now having (98304 > MB, 12.00 CORE) in a YARN cluster of total resource (1376256 MB, > 168.00 CORE). > {code} > Either the wrong variables are printed or we can correct the message to > represent that the resources logged are per node. As this can confuse the > user into thinking that hawq cluster does not have enough resources. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HAWQ-1549) Re-syncing standby fails even when stop mode is fast
Shubham Sharma created HAWQ-1549: Summary: Re-syncing standby fails even when stop mode is fast Key: HAWQ-1549 URL: https://issues.apache.org/jira/browse/HAWQ-1549 Project: Apache HAWQ Issue Type: Bug Components: Command Line Tools, Standby master Reporter: Shubham Sharma Assignee: Radar Lei Recently observed a behaviour while re-syncing standby from hawq command line. Here are the reproduction steps - 1 - Open a client connection to hawq using psql 2 - From a different terminal run command - hawq init standby -n -v -M fast 3 - Standby resync fails with error {code} 20171113:03:49:21:158354 hawq_stop:hdp3:gpadmin-[WARNING]:-There are other connections to this instance, shutdown mode smart aborted 20171113:03:49:21:158354 hawq_stop:hdp3:gpadmin-[WARNING]:-Either remove connections, or use 'hawq stop master -M fast' or 'hawq stop master -M immediate' 20171113:03:49:21:158354 hawq_stop:hdp3:gpadmin-[WARNING]:-See hawq stop --help for all options 20171113:03:49:21:158354 hawq_stop:hdp3:gpadmin-[ERROR]:-Active connections. Aborting shutdown... 20171113:03:49:21:158143 hawq_init:hdp3:gpadmin-[ERROR]:-Stop hawq cluster failed, exit {code} 4 - When -M (stop mode) is passed it should terminate existing client connections. The source of this issue appears to be tools/bin/hawq_ctl method _resync_standby. When this is called the command formation does not include stop_mode options as passed to the arguments. {code} def _resync_standby(self): logger.info("Re-sync standby") cmd = "%s; hawq stop master -a;" % source_hawq_env check_return_code(local_ssh(cmd, logger), logger, "Stop hawq cluster failed, exit") .. .. {code} I can start this and submit a PR when changes are done. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HAWQ-1548) Ambiguous message while logging hawq utilization
Shubham Sharma created HAWQ-1548: Summary: Ambiguous message while logging hawq utilization Key: HAWQ-1548 URL: https://issues.apache.org/jira/browse/HAWQ-1548 Project: Apache HAWQ Issue Type: Improvement Components: libyarn Reporter: Shubham Sharma Assignee: Lin Wen While YARN mode is enabled, resource broker logs two things - - YARN cluster total resource - HAWQ's total resource per node. Following messages are logged {code} 2017-11-11 23:21:40.944904 UTC,,,p549330,th9000778560,con4,,seg-1,"LOG","0","Resource manager YARN resource broker counted YARN cluster having total resource (1376256 MB, 168.00 CORE).",,,0,,"resourcebroker_LIBYARN.c",776, 2017-11-11 23:21:40.944921 UTC,,,p549330,th9000778560,con4,,seg-1,"LOG","0","Resource manager YARN resource broker counted HAWQ cluster now having (98304 MB, 12.00 CORE) in a YARN cluster of total resource (1376256 MB, 168.00 CORE).",,,0,,"resourcebroker_LIBYARN.c",785, {code} The second message shown above is ambiguous, After reading the sentence below it looks like that complete Hawq cluster in whole has only 98304 MB and 12 cores. However according to the configuration it should be 98304 MB and 12 cores per segment server. {code} Resource manager YARN resource broker counted HAWQ cluster now having (98304 MB, 12.00 CORE) in a YARN cluster of total resource (1376256 MB, 168.00 CORE). {code} Either the wrong variables are printed or we can correct the message to represent that the resources logged are per node. As this can confuse the user into thinking that hawq cluster does not have enough resources. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HAWQ-1527) Implement partition pushdown for integral data types when accessing hive through pxf
Shubham Sharma created HAWQ-1527: Summary: Implement partition pushdown for integral data types when accessing hive through pxf Key: HAWQ-1527 URL: https://issues.apache.org/jira/browse/HAWQ-1527 Project: Apache HAWQ Issue Type: Improvement Components: PXF Reporter: Shubham Sharma Assignee: Ed Espino Hive, when accessed through hcatalog, currently supports partition filtering using columns whose datatype is either string or one of the integral data types (which are TINYINT, SMALLINT, INT, BIGINT). However pxf currently ignores non string partition columns([code reference | https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveDataFragmenter.java#L427-#L431]) Hive supports only two operations while filtering partition using integral columns. these operations are "=" and "!=" . [Code reference|https://github.com/apache/hive/blob/branch-1.2/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java#L433-#L457] for hive ExpressionTree. Hive introduced a parameter hive.metastore.integral.jdo.pushdown which must be set to true in hive-site.xml to enable filtering on integral datatypes. Logging this JIRA to implement this feature and enable partition filtering using Integral data types from pxf. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-421) hawq script expects GPHOME to be used for hawq_home
[ https://issues.apache.org/jira/browse/HAWQ-421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152084#comment-16152084 ] Shubham Sharma commented on HAWQ-421: - [~lei_chang] I want to take this up, initiated a discussion on dev list today. > hawq script expects GPHOME to be used for hawq_home > --- > > Key: HAWQ-421 > URL: https://issues.apache.org/jira/browse/HAWQ-421 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Affects Versions: 2.0.0.0-incubating >Reporter: Konstantin Boudnik >Assignee: Lei Chang > Fix For: backlog > > > There two bits in the code that look out of place: > - {{greenplum_path.sh}} > - {{GPHOME}} env.var > The former sets all sorts of environment variables, including the latter. > Which is used in the {{bin/hawq}} script. This seems to be out of place with > the project name and is confusing. Let's fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HAWQ-1524) Travis CI build failure after upgrading protobuf to 3.4
Shubham Sharma created HAWQ-1524: Summary: Travis CI build failure after upgrading protobuf to 3.4 Key: HAWQ-1524 URL: https://issues.apache.org/jira/browse/HAWQ-1524 Project: Apache HAWQ Issue Type: Bug Components: Build Reporter: Shubham Sharma Assignee: Radar Lei After upgrading the protobuf version to 3.4 , CI pipeline fails with below errors. From the error message it looks like it is a problem with namespace resolution while declaring stringstream and ostringstream {code} Error message - /Users/travis/build/apache/incubator-hawq/depends/libyarn/src/libyarnclient/LibYarnClient.cpp:248:9: error: unknown type name 'stringstream'; did you mean 'std::stringstream'? stringstream ss; ^~~~ std::stringstream /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/iosfwd:153:38: note: 'std::stringstream' declared here typedef basic_stringstream stringstream; /Users/travis/build/apache/incubator-hawq/depends/libyarn/src/libyarnclient/LibYarnClient.cpp:299:13: error: unknown type name 'ostringstream'; did you mean 'std::ostringstream'? ostringstream key; ^ std::ostringstream /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/iosfwd:152:38: note: 'std::ostringstream' declared here typedef basic_ostringstreamostringstream; {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1198) Fragmenter should return only relevant fragments for partitioned tables when X-GP-FILTER passed
[ https://issues.apache.org/jira/browse/HAWQ-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105573#comment-16105573 ] Shubham Sharma commented on HAWQ-1198: -- [~odiachenko] [~shivram] [~vVineet] - For your reference. > Fragmenter should return only relevant fragments for partitioned tables when > X-GP-FILTER passed > --- > > Key: HAWQ-1198 > URL: https://issues.apache.org/jira/browse/HAWQ-1198 > Project: Apache HAWQ > Issue Type: Improvement > Components: PXF >Reporter: Oleksandr Diachenko >Assignee: Oleksandr Diachenko > > Currently, PXF Fragmenter api returns all fragments even if X-GP-FILTER is > provided. In a case of a partitioned table it's possible to evaluate > X-GP-FILTER and exclude irrelevant partitions if possible. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1198) Fragmenter should return only relevant fragments for partitioned tables when X-GP-FILTER passed
[ https://issues.apache.org/jira/browse/HAWQ-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103334#comment-16103334 ] Shubham Sharma commented on HAWQ-1198: -- In HiveDataFragmenter while building a BasicFilter all operators apart from "=" are [ignored( code reference) | https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveDataFragmenter.java#L405-#L411 ]. This causes hive to fetch irrelevant fragments for every query that does not have an "equal to" operator. [Pull request 1272 | https://github.com/apache/incubator-hawq/pull/1272] addresses this issue. After the changes tested few queries and filtering out irrelevant fragments give a 10X performance increase while querying hive through pxf. Test queries {code} between_operator_test.sql set client_min_messages to debug; \timing select * from hawq_hive where datelocal between '2016-05-27' and '2016-05-29'; in_operator_test.sql set client_min_messages to debug; \timing select * from hawq_hive a where a.datelocal in ('2016-05-29','2016-05-28','2016-05-27'); logical_operator_test.sql set client_min_messages to debug; \timing select * from hawq_hive where datelocal <= '2016-05-29' and datelocal >='2016-05-27'; union_operator_test.sql set client_min_messages to debug; \timing select count(*) from (select * from hawq_hive a where a.datelocal = '2016-05-29' union all select * from hawq_hive b where b.datelocal='2016-05-28' union all select * from hawq_hive b where b.datelocal='2016-05-27' ) fs; {code} Before code changes {code} psql -f between_operator_test.sql &> before/between_operator_test.out psql -f union_operator_test.sql &> before/union_operator_test.out psql -f logical_operator_test.sql &> before/logical_operator_test.out psql -f in_operator_test.sql &> before/in_operator_test.out [gpadmin@localhost hive_fragmenter]$ grep -i "Fragment list" before/* Number of fragments filtered 136, should be three before/between_operator_test.out:psql:between_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) before/between_operator_test.out:psql:between_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) before/in_operator_test.out:psql:in_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) before/in_operator_test.out:psql:in_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) before/logical_operator_test.out:psql:logical_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) before/logical_operator_test.out:psql:logical_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) before/union_operator_test.out:psql:union_operator_test.sql:3: DEBUG2: Fragment list: (1 elements, pxf_isilon = false) [gpadmin@localhost hive_fragmenter]$ grep Time before/* before/between_operator_test.out:Time: 4485.182 ms before/in_operator_test.out:Time: 2285.578 ms before/logical_operator_test.out:Time: 2508.315 ms before/union_operator_test.out:Time: 609.298 ms {code} After code changes {code} [gpadmin@localhost hive_fragmenter]$ psql -f between_operator_test.sql &> after/between_operator_test.out [gpadmin@localhost hive_fragmenter]$ psql -f union_operator_test.sql &> after/union_operator_test.out [gpadmin@localhost hive_fragmenter]$ psql -f logical_operator_test.sql &> after/logical_operator_test.out [gpadmin@localhost hive_fragmenter]$ psql -f in_operator_test.sql &> after/in_operator_test.out Number of fragments filtered reduced to three except for IN operator for which filtering is not implemented yet. [gpadmin@localhost hive_fragmenter]$ grep -i "Fragment list" after/* after/between_operator_test.out:psql:between_operator_test.sql:3: DEBUG2: Fragment list: (3 elements, pxf_isilon = false) after/between_operator_test.out:psql:between_operator_test.sql:3: DEBUG2: Fragment list: (3 elements, pxf_isilon = false) after/in_operator_test.out:psql:in_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) after/in_operator_test.out:psql:in_operator_test.sql:3: DEBUG2: Fragment list: (136 elements, pxf_isilon = false) after/logical_operator_test.out:psql:logical_operator_test.sql:3: DEBUG2: Fragment list: (3 elements,
[jira] [Commented] (HAWQ-1504) Namenode hangs during restart of docker environment configured using incubator-hawq/contrib/hawq-docker/
[ https://issues.apache.org/jira/browse/HAWQ-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091098#comment-16091098 ] Shubham Sharma commented on HAWQ-1504: -- Submitted [PR 1267 |https://github.com/apache/incubator-hawq/pull/1267] > Namenode hangs during restart of docker environment configured using > incubator-hawq/contrib/hawq-docker/ > > > Key: HAWQ-1504 > URL: https://issues.apache.org/jira/browse/HAWQ-1504 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Reporter: Shubham Sharma >Assignee: Radar Lei >Priority: Minor > > After setting up an environment using instructions provided under > incubator-hawq/contrib/hawq-docker/, while trying to restart docker > containers namenode hangs and tries a namenode -format during every start. > Steps to reproduce this issue - > - Navigate to incubator-hawq/contrib/hawq-docker > - make stop > - make start > - docker exec -it centos7-namenode bash > - ps -ef | grep java > You can see namenode -format running. > {code} > [gpadmin@centos7-namenode data]$ ps -ef | grep java > hdfs1110 1 00:56 ?00:00:06 > /etc/alternatives/java_sdk/bin/java -Dproc_namenode -Xmx1000m > -Dhdfs.namenode=centos7-namenode -Dhadoop.log.dir=/var/log/hadoop/hdfs > -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.5.0.0-1245/hadoop > -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console > -Djava.library.path=:/usr/hdp/2.5.0.0-1245/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.0.0-1245/hadoop/lib/native > -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true > -Dhadoop.security.logger=INFO,NullAppender > org.apache.hadoop.hdfs.server.namenode.NameNode -format > {code} > Since namenode -format runs in interactive mode and at this stage it is > waiting for a (Yes/No) response, the namenode will remain stuck forever. This > makes hdfs unavailable. > Root cause of the problem - > In the dockerfiles present under > incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-test and > incubator-hawq/contrib/hawq-docker/centos7-docker/hawq-test, the docker > directive ENTRYPOINT executes entrypoin.sh during startup. > The entrypoint.sh in turn executes start-hdfs.sh. start-dfs.sh checks for the > following - > {code} > if [ ! -d /tmp/hdfs/name/current ]; then > su -l hdfs -c "hdfs namenode -format" > fi > {code} > My assumption is it looks for fsimage and edit logs. If they are not present > the script assumes that this a first time initialization and namenode format > should be done. However, path /tmp/hdfs/name/current does not exist on > namenode. > From namenode logs it is clear that fsimage and edit logs are written under > /tmp/hadoop-hdfs/dfs/name/current. > {code} > 2017-07-18 00:55:20,892 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > No edit log streams selected. > 2017-07-18 00:55:20,893 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Planning to load image: > FSImageFile(file=/tmp/hadoop-hdfs/dfs/name/current/fsimage_000, > cpktTxId=000) > 2017-07-18 00:55:20,995 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 1 INodes. > 2017-07-18 00:55:21,064 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Loaded FSImage > in 0 seconds. > 2017-07-18 00:55:21,065 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded image for txid 0 from > /tmp/hadoop-hdfs/dfs/name/current/fsimage_000 > 2017-07-18 00:55:21,084 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? > false (staleImage=false, haEnabled=false, isRollingUpgrade=false) > 2017-07-18 00:55:21,084 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 1 > {code} > Thus wrong path in > incubator-hawq/contrib/hawq-docker/centos*-docker/hawq-test/start-hdfs.sh > causes namenode to hang during each restart of the containers making hdfs > unavailable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HAWQ-1504) Namenode hangs during restart of docker environment configured using incubator-hawq/contrib/hawq-docker/
[ https://issues.apache.org/jira/browse/HAWQ-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Sharma updated HAWQ-1504: - Priority: Minor (was: Major) > Namenode hangs during restart of docker environment configured using > incubator-hawq/contrib/hawq-docker/ > > > Key: HAWQ-1504 > URL: https://issues.apache.org/jira/browse/HAWQ-1504 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Reporter: Shubham Sharma >Assignee: Radar Lei >Priority: Minor > > After setting up an environment using instructions provided under > incubator-hawq/contrib/hawq-docker/, while trying to restart docker > containers namenode hangs and tries a namenode -format during every start. > Steps to reproduce this issue - > - Navigate to incubator-hawq/contrib/hawq-docker > - make stop > - make start > - docker exec -it centos7-namenode bash > - ps -ef | grep java > You can see namenode -format running. > {code} > [gpadmin@centos7-namenode data]$ ps -ef | grep java > hdfs1110 1 00:56 ?00:00:06 > /etc/alternatives/java_sdk/bin/java -Dproc_namenode -Xmx1000m > -Dhdfs.namenode=centos7-namenode -Dhadoop.log.dir=/var/log/hadoop/hdfs > -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.5.0.0-1245/hadoop > -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console > -Djava.library.path=:/usr/hdp/2.5.0.0-1245/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.0.0-1245/hadoop/lib/native > -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true > -Dhadoop.security.logger=INFO,NullAppender > org.apache.hadoop.hdfs.server.namenode.NameNode -format > {code} > Since namenode -format runs in interactive mode and at this stage it is > waiting for a (Yes/No) response, the namenode will remain stuck forever. This > makes hdfs unavailable. > Root cause of the problem - > In the dockerfiles present under > incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-test and > incubator-hawq/contrib/hawq-docker/centos7-docker/hawq-test, the docker > directive ENTRYPOINT executes entrypoin.sh during startup. > The entrypoint.sh in turn executes start-hdfs.sh. start-dfs.sh checks for the > following - > {code} > if [ ! -d /tmp/hdfs/name/current ]; then > su -l hdfs -c "hdfs namenode -format" > fi > {code} > My assumption is it looks for fsimage and edit logs. If they are not present > the script assumes that this a first time initialization and namenode format > should be done. However, path /tmp/hdfs/name/current does not exist on > namenode. > From namenode logs it is clear that fsimage and edit logs are written under > /tmp/hadoop-hdfs/dfs/name/current. > {code} > 2017-07-18 00:55:20,892 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > No edit log streams selected. > 2017-07-18 00:55:20,893 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Planning to load image: > FSImageFile(file=/tmp/hadoop-hdfs/dfs/name/current/fsimage_000, > cpktTxId=000) > 2017-07-18 00:55:20,995 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 1 INodes. > 2017-07-18 00:55:21,064 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Loaded FSImage > in 0 seconds. > 2017-07-18 00:55:21,065 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded image for txid 0 from > /tmp/hadoop-hdfs/dfs/name/current/fsimage_000 > 2017-07-18 00:55:21,084 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? > false (staleImage=false, haEnabled=false, isRollingUpgrade=false) > 2017-07-18 00:55:21,084 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 1 > {code} > Thus wrong path in > incubator-hawq/contrib/hawq-docker/centos*-docker/hawq-test/start-hdfs.sh > causes namenode to hang during each restart of the containers making hdfs > unavailable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HAWQ-1504) Namenode hangs during restart of docker environment configured using incubator-hawq/contrib/hawq-docker/
Shubham Sharma created HAWQ-1504: Summary: Namenode hangs during restart of docker environment configured using incubator-hawq/contrib/hawq-docker/ Key: HAWQ-1504 URL: https://issues.apache.org/jira/browse/HAWQ-1504 Project: Apache HAWQ Issue Type: Bug Components: Command Line Tools Reporter: Shubham Sharma Assignee: Radar Lei After setting up an environment using instructions provided under incubator-hawq/contrib/hawq-docker/, while trying to restart docker containers namenode hangs and tries a namenode -format during every start. Steps to reproduce this issue - - Navigate to incubator-hawq/contrib/hawq-docker - make stop - make start - docker exec -it centos7-namenode bash - ps -ef | grep java You can see namenode -format running. {code} [gpadmin@centos7-namenode data]$ ps -ef | grep java hdfs1110 1 00:56 ?00:00:06 /etc/alternatives/java_sdk/bin/java -Dproc_namenode -Xmx1000m -Dhdfs.namenode=centos7-namenode -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.5.0.0-1245/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.5.0.0-1245/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.0.0-1245/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.hdfs.server.namenode.NameNode -format {code} Since namenode -format runs in interactive mode and at this stage it is waiting for a (Yes/No) response, the namenode will remain stuck forever. This makes hdfs unavailable. Root cause of the problem - In the dockerfiles present under incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-test and incubator-hawq/contrib/hawq-docker/centos7-docker/hawq-test, the docker directive ENTRYPOINT executes entrypoin.sh during startup. The entrypoint.sh in turn executes start-hdfs.sh. start-dfs.sh checks for the following - {code} if [ ! -d /tmp/hdfs/name/current ]; then su -l hdfs -c "hdfs namenode -format" fi {code} My assumption is it looks for fsimage and edit logs. If they are not present the script assumes that this a first time initialization and namenode format should be done. However, path /tmp/hdfs/name/current does not exist on namenode. >From namenode logs it is clear that fsimage and edit logs are written under >/tmp/hadoop-hdfs/dfs/name/current. {code} 2017-07-18 00:55:20,892 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: No edit log streams selected. 2017-07-18 00:55:20,893 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Planning to load image: FSImageFile(file=/tmp/hadoop-hdfs/dfs/name/current/fsimage_000, cpktTxId=000) 2017-07-18 00:55:20,995 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 1 INodes. 2017-07-18 00:55:21,064 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Loaded FSImage in 0 seconds. 2017-07-18 00:55:21,065 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded image for txid 0 from /tmp/hadoop-hdfs/dfs/name/current/fsimage_000 2017-07-18 00:55:21,084 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? false (staleImage=false, haEnabled=false, isRollingUpgrade=false) 2017-07-18 00:55:21,084 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 1 {code} Thus wrong path in incubator-hawq/contrib/hawq-docker/centos*-docker/hawq-test/start-hdfs.sh causes namenode to hang during each restart of the containers making hdfs unavailable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1504) Namenode hangs during restart of docker environment configured using incubator-hawq/contrib/hawq-docker/
[ https://issues.apache.org/jira/browse/HAWQ-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090935#comment-16090935 ] Shubham Sharma commented on HAWQ-1504: -- Submitting a PR shortly > Namenode hangs during restart of docker environment configured using > incubator-hawq/contrib/hawq-docker/ > > > Key: HAWQ-1504 > URL: https://issues.apache.org/jira/browse/HAWQ-1504 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools >Reporter: Shubham Sharma >Assignee: Radar Lei > > After setting up an environment using instructions provided under > incubator-hawq/contrib/hawq-docker/, while trying to restart docker > containers namenode hangs and tries a namenode -format during every start. > Steps to reproduce this issue - > - Navigate to incubator-hawq/contrib/hawq-docker > - make stop > - make start > - docker exec -it centos7-namenode bash > - ps -ef | grep java > You can see namenode -format running. > {code} > [gpadmin@centos7-namenode data]$ ps -ef | grep java > hdfs1110 1 00:56 ?00:00:06 > /etc/alternatives/java_sdk/bin/java -Dproc_namenode -Xmx1000m > -Dhdfs.namenode=centos7-namenode -Dhadoop.log.dir=/var/log/hadoop/hdfs > -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.5.0.0-1245/hadoop > -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console > -Djava.library.path=:/usr/hdp/2.5.0.0-1245/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.0.0-1245/hadoop/lib/native > -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true > -Dhadoop.security.logger=INFO,NullAppender > org.apache.hadoop.hdfs.server.namenode.NameNode -format > {code} > Since namenode -format runs in interactive mode and at this stage it is > waiting for a (Yes/No) response, the namenode will remain stuck forever. This > makes hdfs unavailable. > Root cause of the problem - > In the dockerfiles present under > incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-test and > incubator-hawq/contrib/hawq-docker/centos7-docker/hawq-test, the docker > directive ENTRYPOINT executes entrypoin.sh during startup. > The entrypoint.sh in turn executes start-hdfs.sh. start-dfs.sh checks for the > following - > {code} > if [ ! -d /tmp/hdfs/name/current ]; then > su -l hdfs -c "hdfs namenode -format" > fi > {code} > My assumption is it looks for fsimage and edit logs. If they are not present > the script assumes that this a first time initialization and namenode format > should be done. However, path /tmp/hdfs/name/current does not exist on > namenode. > From namenode logs it is clear that fsimage and edit logs are written under > /tmp/hadoop-hdfs/dfs/name/current. > {code} > 2017-07-18 00:55:20,892 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > No edit log streams selected. > 2017-07-18 00:55:20,893 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Planning to load image: > FSImageFile(file=/tmp/hadoop-hdfs/dfs/name/current/fsimage_000, > cpktTxId=000) > 2017-07-18 00:55:20,995 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 1 INodes. > 2017-07-18 00:55:21,064 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Loaded FSImage > in 0 seconds. > 2017-07-18 00:55:21,065 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded image for txid 0 from > /tmp/hadoop-hdfs/dfs/name/current/fsimage_000 > 2017-07-18 00:55:21,084 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? > false (staleImage=false, haEnabled=false, isRollingUpgrade=false) > 2017-07-18 00:55:21,084 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 1 > {code} > Thus wrong path in > incubator-hawq/contrib/hawq-docker/centos*-docker/hawq-test/start-hdfs.sh > causes namenode to hang during each restart of the containers making hdfs > unavailable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HAWQ-1503) Failure building on centos-6 using dockerfile present under incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-dev
[ https://issues.apache.org/jira/browse/HAWQ-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087470#comment-16087470 ] Shubham Sharma edited comment on HAWQ-1503 at 7/14/17 4:21 PM: --- Submitted PR - [#1266|https://github.com/apache/incubator-hawq/pull/1266] was (Author: outofmemory): Submitting PR shortly. > Failure building on centos-6 using dockerfile present under > incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-dev > -- > > Key: HAWQ-1503 > URL: https://issues.apache.org/jira/browse/HAWQ-1503 > Project: Apache HAWQ > Issue Type: Bug > Components: Build >Reporter: Shubham Sharma >Assignee: Radar Lei >Priority: Minor > > Using build instructions from > [repo|https://github.com/apache/incubator-hawq/tree/master/contrib/hawq-docker] > make build fails while building images for centos6. > From the error it looks like the ftp link, which dockerfile uses to upgrade > gcc etc. no longer exists. > {code} > curl: (22) The requested URL returned error: 404 Not Found > error: > http://ftp.scientificlinux.org/linux/scientific/5x/x86_64/RPM-GPG-KEYs/RPM-GPG-KEY-cern: > import read failed(2). > The command '/bin/sh -c wget -O /etc/yum.repos.d/slc6-devtoolset.repo > http://linuxsoft.cern.ch/cern/devtoolset/slc6-devtoolset.repo && rpm > --import > http://ftp.scientificlinux.org/linux/scientific/5x/x86_64/RPM-GPG-KEYs/RPM-GPG-KEY-cern > && yum install -y devtoolset-2-gcc devtoolset-2-binutils > devtoolset-2-gcc-c++ && echo "source /opt/rh/devtoolset-2/enable" >> > ~/.bashrc && source ~/.bashrc' returned a non-zero code: 1 > make[1]: *** [build-hawq-dev-centos6] Error 1 > make: *** [build] Error 2 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1503) Failure building on centos-6 using dockerfile present under incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-dev
[ https://issues.apache.org/jira/browse/HAWQ-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087470#comment-16087470 ] Shubham Sharma commented on HAWQ-1503: -- Submitting PR shortly. > Failure building on centos-6 using dockerfile present under > incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-dev > -- > > Key: HAWQ-1503 > URL: https://issues.apache.org/jira/browse/HAWQ-1503 > Project: Apache HAWQ > Issue Type: Bug > Components: Build >Reporter: Shubham Sharma >Assignee: Radar Lei >Priority: Minor > > Using build instructions from > [repo|https://github.com/apache/incubator-hawq/tree/master/contrib/hawq-docker] > make build fails while building images for centos6. > From the error it looks like the ftp link, which dockerfile uses to upgrade > gcc etc. no longer exists. > {code} > curl: (22) The requested URL returned error: 404 Not Found > error: > http://ftp.scientificlinux.org/linux/scientific/5x/x86_64/RPM-GPG-KEYs/RPM-GPG-KEY-cern: > import read failed(2). > The command '/bin/sh -c wget -O /etc/yum.repos.d/slc6-devtoolset.repo > http://linuxsoft.cern.ch/cern/devtoolset/slc6-devtoolset.repo && rpm > --import > http://ftp.scientificlinux.org/linux/scientific/5x/x86_64/RPM-GPG-KEYs/RPM-GPG-KEY-cern > && yum install -y devtoolset-2-gcc devtoolset-2-binutils > devtoolset-2-gcc-c++ && echo "source /opt/rh/devtoolset-2/enable" >> > ~/.bashrc && source ~/.bashrc' returned a non-zero code: 1 > make[1]: *** [build-hawq-dev-centos6] Error 1 > make: *** [build] Error 2 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HAWQ-1503) Failure building on centos-6 using dockerfile present under incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-dev
[ https://issues.apache.org/jira/browse/HAWQ-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Sharma updated HAWQ-1503: - Priority: Minor (was: Major) > Failure building on centos-6 using dockerfile present under > incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-dev > -- > > Key: HAWQ-1503 > URL: https://issues.apache.org/jira/browse/HAWQ-1503 > Project: Apache HAWQ > Issue Type: Bug > Components: Build >Reporter: Shubham Sharma >Assignee: Radar Lei >Priority: Minor > > Using build instructions from > [repo|https://github.com/apache/incubator-hawq/tree/master/contrib/hawq-docker] > make build fails while building images for centos6. > From the error it looks like the ftp link, which dockerfile uses to upgrade > gcc etc. no longer exists. > {code} > curl: (22) The requested URL returned error: 404 Not Found > error: > http://ftp.scientificlinux.org/linux/scientific/5x/x86_64/RPM-GPG-KEYs/RPM-GPG-KEY-cern: > import read failed(2). > The command '/bin/sh -c wget -O /etc/yum.repos.d/slc6-devtoolset.repo > http://linuxsoft.cern.ch/cern/devtoolset/slc6-devtoolset.repo && rpm > --import > http://ftp.scientificlinux.org/linux/scientific/5x/x86_64/RPM-GPG-KEYs/RPM-GPG-KEY-cern > && yum install -y devtoolset-2-gcc devtoolset-2-binutils > devtoolset-2-gcc-c++ && echo "source /opt/rh/devtoolset-2/enable" >> > ~/.bashrc && source ~/.bashrc' returned a non-zero code: 1 > make[1]: *** [build-hawq-dev-centos6] Error 1 > make: *** [build] Error 2 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HAWQ-1503) Failure building on centos-6 using dockerfile present under incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-dev
Shubham Sharma created HAWQ-1503: Summary: Failure building on centos-6 using dockerfile present under incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-dev Key: HAWQ-1503 URL: https://issues.apache.org/jira/browse/HAWQ-1503 Project: Apache HAWQ Issue Type: Bug Components: Build Reporter: Shubham Sharma Assignee: Radar Lei Using build instructions from [repo|https://github.com/apache/incubator-hawq/tree/master/contrib/hawq-docker] make build fails while building images for centos6. >From the error it looks like the ftp link, which dockerfile uses to upgrade >gcc etc. no longer exists. {code} curl: (22) The requested URL returned error: 404 Not Found error: http://ftp.scientificlinux.org/linux/scientific/5x/x86_64/RPM-GPG-KEYs/RPM-GPG-KEY-cern: import read failed(2). The command '/bin/sh -c wget -O /etc/yum.repos.d/slc6-devtoolset.repo http://linuxsoft.cern.ch/cern/devtoolset/slc6-devtoolset.repo && rpm --import http://ftp.scientificlinux.org/linux/scientific/5x/x86_64/RPM-GPG-KEYs/RPM-GPG-KEY-cern && yum install -y devtoolset-2-gcc devtoolset-2-binutils devtoolset-2-gcc-c++ && echo "source /opt/rh/devtoolset-2/enable" >> ~/.bashrc && source ~/.bashrc' returned a non-zero code: 1 make[1]: *** [build-hawq-dev-centos6] Error 1 make: *** [build] Error 2 {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1495) TestRowTypes.BasicTest fails due to wrong answer file
[ https://issues.apache.org/jira/browse/HAWQ-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072982#comment-16072982 ] Shubham Sharma commented on HAWQ-1495: -- In *src/test/feature/query/sql/rowtypes.sql* the value inserted into the table is `"insert into people values ('(Joe,Blow)', '1984-01-10');"`, where date is in the format - `1984-01-10`. But in `*src/test/feature/query/ans/rowtypes.ans*`, whenever doing a "select * from people" the answer is stored as `01-10-1984`. This causes the test to fail. Corrected the date format and test executes successfully. Submitted [pull request|https://github.com/apache/incubator-hawq/pull/1263] to address this issue. > TestRowTypes.BasicTest fails due to wrong answer file > - > > Key: HAWQ-1495 > URL: https://issues.apache.org/jira/browse/HAWQ-1495 > Project: Apache HAWQ > Issue Type: Bug > Components: Tests >Reporter: Shubham Sharma >Assignee: Jiali Yao > > While running feature test TestRowTypes, it fails with following error > {code} > [gpadmin@centos7-namenode feature]$ ./feature-test > --gtest_filter="TestRowTypes*" > Note: Google Test filter = TestRowTypes* > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from TestRowTypes > [ RUN ] TestRowTypes.BasicTest > COPY tenk1 FROM '/data/hawq/src/test/feature/query/data/tenk.data' > lib/sql_util.cpp:197: Failure > Value of: is_sql_ans_diff > Actual: true > Expected: false > lib/sql_util.cpp:203: Failure > Value of: true > Actual: true > Expected: false > [ FAILED ] TestRowTypes.BasicTest (9493 ms) > [--] 1 test from TestRowTypes (9493 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (9493 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] TestRowTypes.BasicTest > 1 FAILED TEST > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HAWQ-1495) TestRowTypes.BasicTest fails due to wrong answer file
Shubham Sharma created HAWQ-1495: Summary: TestRowTypes.BasicTest fails due to wrong answer file Key: HAWQ-1495 URL: https://issues.apache.org/jira/browse/HAWQ-1495 Project: Apache HAWQ Issue Type: Bug Components: Tests Reporter: Shubham Sharma Assignee: Jiali Yao While running feature test TestRowTypes, it fails with following error {code} [gpadmin@centos7-namenode feature]$ ./feature-test --gtest_filter="TestRowTypes*" Note: Google Test filter = TestRowTypes* [==] Running 1 test from 1 test case. [--] Global test environment set-up. [--] 1 test from TestRowTypes [ RUN ] TestRowTypes.BasicTest COPY tenk1 FROM '/data/hawq/src/test/feature/query/data/tenk.data' lib/sql_util.cpp:197: Failure Value of: is_sql_ans_diff Actual: true Expected: false lib/sql_util.cpp:203: Failure Value of: true Actual: true Expected: false [ FAILED ] TestRowTypes.BasicTest (9493 ms) [--] 1 test from TestRowTypes (9493 ms total) [--] Global test environment tear-down [==] 1 test from 1 test case ran. (9493 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] TestRowTypes.BasicTest 1 FAILED TEST {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1480) Packing a core file in hawq
[ https://issues.apache.org/jira/browse/HAWQ-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038129#comment-16038129 ] Shubham Sharma commented on HAWQ-1480: -- [~vVineet] PR #1251 submitted. > Packing a core file in hawq > --- > > Key: HAWQ-1480 > URL: https://issues.apache.org/jira/browse/HAWQ-1480 > Project: Apache HAWQ > Issue Type: Improvement > Components: Command Line Tools >Reporter: Shubham Sharma >Assignee: Radar Lei > > Currently there is no way to packing a core file with its context – > executable, application and system shared libraries in hawq. This information > can be later unpacked on another system and helps in debugging. It is a > useful feature to quickly gather all the data needed from a crash/core > generated on the system to analyze it later. > Another open source project, greenplum, uses a script > [https://github.com/greenplum-db/gpdb/blob/master/gpMgmt/sbin/packcore] to > collect this information. Tested this script against Hawq's installation and > it collects the required information needed for debug. > Can this be merged into Hawq, if yes, I can submit a pull request and test it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HAWQ-1480) Packing a core file in hawq
Shubham Sharma created HAWQ-1480: Summary: Packing a core file in hawq Key: HAWQ-1480 URL: https://issues.apache.org/jira/browse/HAWQ-1480 Project: Apache HAWQ Issue Type: Improvement Components: Command Line Tools Reporter: Shubham Sharma Assignee: Radar Lei Currently there is no way to packing a core file with its context – executable, application and system shared libraries in hawq. This information can be later unpacked on another system and helps in debugging. It is a useful feature to quickly gather all the data needed from a crash/core generated on the system to analyze it later. Another open source project, greenplum, uses a script [https://github.com/greenplum-db/gpdb/blob/master/gpMgmt/sbin/packcore] to collect this information. Tested this script against Hawq's installation and it collects the required information needed for debug. Can this be merged into Hawq, if yes, I can submit a pull request and test it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HAWQ-1433) ALTER RESOURCE QUEUE DDL does not check the format of attribute MEMORY_CLUSTER_LIMIT and CORE_CLUSTER_LIMIT
[ https://issues.apache.org/jira/browse/HAWQ-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975973#comment-15975973 ] Shubham Sharma commented on HAWQ-1433: -- That appears to be a better solution, thank you for the explanation [~xsheng] > ALTER RESOURCE QUEUE DDL does not check the format of attribute > MEMORY_CLUSTER_LIMIT and CORE_CLUSTER_LIMIT > --- > > Key: HAWQ-1433 > URL: https://issues.apache.org/jira/browse/HAWQ-1433 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager >Reporter: Yi Jin >Assignee: Xiang Sheng > Fix For: 2.3.0.0-incubating > > > Shubham Sharma> 2:11 PM (2 hours ago) > to user, sebastiao.gone. > Hello Sebastio, I think you have encountered the following issue - > 1 - Problem - alter resource queue pg_default with > (CORE_LIMIT_CLUSTER/MEMORY_LIMIT_CLUSTER=90); > gpadmin=# select * from pg_resqueue; > rsqname | parentoid | activestats | memorylimit | corelimit | > resovercommit | allocpolicy | vsegresourcequota | nvsegupperlimit | > nvseglowerlimit | nvseg > upperlimitperseg | nvseglowerlimitperseg | creationtime | updatetime > | status > +---+-+-+---+---+-+---+-+-+-- > -+---+--+---+ > pg_root| 0 | -1 | 100%| 100% | > 2 | even| | 0 | 0 | > >0 | 0 | | > | branch > pg_default | 9800 | 20 | 50% | 50% | > 2 | even| mem:256mb | 0 | 0 | > >0 | 0 | | 2017-04-12 > 22:45:55.056102+01 | > (2 rows) > gpadmin=# alter resource queue pg_default with (CORE_LIMIT_CLUSTER=90); > ALTER QUEUE > gpadmin=# select * from test; > a > --- > (0 rows) > gpadmin=# \q > 2 - restart hawq cluster > 3 - ERROR > [gpadmin@hdp3 ~]$ psql > psql (8.2.15) > Type "help" for help. > gpadmin=# select * from test; > WARNING: FD 31 having errors raised. errno 104 > ERROR: failed to register in resource manager, failed to receive content > (pquery.c:787) > 3 - alter resource queue pg_default with > (CORE_LIMIT_CLUSTER/MEMORY_LIMIT_CLUSTER=50%); --Let's switch back > ! Not allowed ! > alter resource queue pg_default with (CORE_LIMIT_CLUSTER=50%); > WARNING: FD 33 having errors raised. errno 104 > ERROR: failed to register in resource manager, failed to receive content > (resqueuecommand.c:364) > 4 - How to fix - Please be extra careful while using this. > gpadmin=# begin; > BEGIN > gpadmin=# set allow_system_table_mods='dml'; > SET > gpadmin=# select * from pg_resqueue where corelimit=90; > rsqname | parentoid | activestats | memorylimit | corelimit | > resovercommit | allocpolicy | vsegresourcequota | nvsegupperlimit | > nvseglowerlimit | nvseg > upperlimitperseg | nvseglowerlimitperseg | creationtime | updatetime > | status > +---+-+-+---+---+-+---+-+-+-- > -+---+--+---+ > pg_default | 9800 | 20 | 50% | 90| > 2 | even| mem:256mb | 0 | 0 | > >0 | 0 | | 2017-04-12 > 22:59:30.092823+01 | > (1 row) > gpadmin=# update pg_resqueue set corelimit='50%' where corelimit=90; > UPDATE 1 > gpadmin=# commit; > COMMIT > 5 - System should be back to normal > gpadmin=# select * from test; > a > --- > (0 rows) > Regards, > Shubh -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HAWQ-1433) ALTER RESOURCE QUEUE DDL does not check the format of attribute MEMORY_CLUSTER_LIMIT
[ https://issues.apache.org/jira/browse/HAWQ-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969208#comment-15969208 ] Shubham Sharma edited comment on HAWQ-1433 at 4/14/17 5:56 PM: --- I did a bit of research on this and I think that the problem exists in resqueuemanager.c: updateResourceQueueAttributesInShadow(). In this function in case RSQ_TBL_ATTR_CORE_LIMIT_CLUSTER after checking whether the string is a percentage value or not (SimpleStringIsPercentage), which will fail as corelimit=90, it asserts false but the value of variable res remains unchanged(FUNC_RETURN_OK). Ideally it should exit out probably by setting something like res=FUNC_RETURN_FAIL in the else block. Further down the stack since res is FUNC_RETURN_OK all validations are successful. {code} case RSQ_TBL_ATTR_CORE_LIMIT_CLUSTER: if ( SimpleStringIsPercentage(attrvalue) ) { percentage_change += 1; int8_t inputval = 0; res = SimpleStringToPercentage(attrvalue, ); shadowqueinfo->ClusterVCorePer = inputval; if ( res == FUNC_RETURN_OK ) { elog(RMLOG, "resource manager updated %s %lf.0%% in shadow " "of resource queue \'%s\'", RSQTBLAttrNames[RSQ_TBL_ATTR_CORE_LIMIT_CLUSTER], shadowqueinfo->ClusterVCorePer, queue->QueueInfo->Name); } shadowqueinfo->Status |= RESOURCE_QUEUE_STATUS_EXPRESS_PERCENT; } else { Assert(false); } break; {code} was (Author: outofmemory): I did a bit of research on this and I think that the problem exists in resqueuemanager.c: updateResourceQueueAttributesInShadow(). In this function in case RSQ_TBL_ATTR_CORE_LIMIT_CLUSTER after checking whether the string is a percentage value or not (SimpleStringIsPercentage), which will fail as corelimit=90, it asserts false but the value of variable res remains unchanged(FUNC_RETURN_OK). Ideally it should exit out probably with something like res=FUNC_RETURN_FAIL. Further down the stack since res is FUNC_RETURN_OK all validations are successful. {code} case RSQ_TBL_ATTR_CORE_LIMIT_CLUSTER: if ( SimpleStringIsPercentage(attrvalue) ) { percentage_change += 1; int8_t inputval = 0; res = SimpleStringToPercentage(attrvalue, ); shadowqueinfo->ClusterVCorePer = inputval; if ( res == FUNC_RETURN_OK ) { elog(RMLOG, "resource manager updated %s %lf.0%% in shadow " "of resource queue \'%s\'", RSQTBLAttrNames[RSQ_TBL_ATTR_CORE_LIMIT_CLUSTER], shadowqueinfo->ClusterVCorePer, queue->QueueInfo->Name); } shadowqueinfo->Status |= RESOURCE_QUEUE_STATUS_EXPRESS_PERCENT; } else { Assert(false); } break; {code} > ALTER RESOURCE QUEUE DDL does not check the format of attribute > MEMORY_CLUSTER_LIMIT > > > Key: HAWQ-1433 > URL: https://issues.apache.org/jira/browse/HAWQ-1433 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager >Reporter: Yi Jin >Assignee: Yi Jin > Fix For: 2.3.0.0-incubating > > > Shubham Sharma> 2:11 PM (2 hours ago) > to user, sebastiao.gone. > Hello Sebastio, I think you have encountered the following issue - > 1 - Problem - alter resource queue pg_default with > (CORE_LIMIT_CLUSTER/MEMORY_LIMIT_CLUSTER=90); > gpadmin=# select * from pg_resqueue; > rsqname | parentoid | activestats | memorylimit | corelimit | > resovercommit |
[jira] [Commented] (HAWQ-1433) ALTER RESOURCE QUEUE DDL does not check the format of attribute MEMORY_CLUSTER_LIMIT
[ https://issues.apache.org/jira/browse/HAWQ-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969208#comment-15969208 ] Shubham Sharma commented on HAWQ-1433: -- I did a bit of research on this and I think that the problem exists in resqueuemanager.c: updateResourceQueueAttributesInShadow(). In this function in case RSQ_TBL_ATTR_CORE_LIMIT_CLUSTER after checking whether the string is a percentage value or not (SimpleStringIsPercentage), which will fail as corelimit=90, it asserts false but the value of variable res remains unchanged(FUNC_RETURN_OK). Ideally it should exit out probably with something like res=FUNC_RETURN_FAIL. Further down the stack since res is FUNC_RETURN_OK all validations are successful. {code} case RSQ_TBL_ATTR_CORE_LIMIT_CLUSTER: if ( SimpleStringIsPercentage(attrvalue) ) { percentage_change += 1; int8_t inputval = 0; res = SimpleStringToPercentage(attrvalue, ); shadowqueinfo->ClusterVCorePer = inputval; if ( res == FUNC_RETURN_OK ) { elog(RMLOG, "resource manager updated %s %lf.0%% in shadow " "of resource queue \'%s\'", RSQTBLAttrNames[RSQ_TBL_ATTR_CORE_LIMIT_CLUSTER], shadowqueinfo->ClusterVCorePer, queue->QueueInfo->Name); } shadowqueinfo->Status |= RESOURCE_QUEUE_STATUS_EXPRESS_PERCENT; } else { Assert(false); } break; {code} > ALTER RESOURCE QUEUE DDL does not check the format of attribute > MEMORY_CLUSTER_LIMIT > > > Key: HAWQ-1433 > URL: https://issues.apache.org/jira/browse/HAWQ-1433 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager >Reporter: Yi Jin >Assignee: Yi Jin > Fix For: 2.3.0.0-incubating > > > Shubham Sharma> 2:11 PM (2 hours ago) > to user, sebastiao.gone. > Hello Sebastio, I think you have encountered the following issue - > 1 - Problem - alter resource queue pg_default with > (CORE_LIMIT_CLUSTER/MEMORY_LIMIT_CLUSTER=90); > gpadmin=# select * from pg_resqueue; > rsqname | parentoid | activestats | memorylimit | corelimit | > resovercommit | allocpolicy | vsegresourcequota | nvsegupperlimit | > nvseglowerlimit | nvseg > upperlimitperseg | nvseglowerlimitperseg | creationtime | updatetime > | status > +---+-+-+---+---+-+---+-+-+-- > -+---+--+---+ > pg_root| 0 | -1 | 100%| 100% | > 2 | even| | 0 | 0 | > >0 | 0 | | > | branch > pg_default | 9800 | 20 | 50% | 50% | > 2 | even| mem:256mb | 0 | 0 | > >0 | 0 | | 2017-04-12 > 22:45:55.056102+01 | > (2 rows) > gpadmin=# alter resource queue pg_default with (CORE_LIMIT_CLUSTER=90); > ALTER QUEUE > gpadmin=# select * from test; > a > --- > (0 rows) > gpadmin=# \q > 2 - restart hawq cluster > 3 - ERROR > [gpadmin@hdp3 ~]$ psql > psql (8.2.15) > Type "help" for help. > gpadmin=# select * from test; > WARNING: FD 31 having errors raised. errno 104 > ERROR: failed to register in resource manager, failed to receive content > (pquery.c:787) > 3 - alter resource queue pg_default with > (CORE_LIMIT_CLUSTER/MEMORY_LIMIT_CLUSTER=50%); --Let's switch back > ! Not allowed ! > alter resource queue pg_default with (CORE_LIMIT_CLUSTER=50%); > WARNING: FD 33 having errors raised. errno 104 > ERROR: failed to register in resource manager, failed to receive content > (resqueuecommand.c:364) > 4 - How to fix - Please be extra careful while using this. > gpadmin=# begin; > BEGIN > gpadmin=# set allow_system_table_mods='dml'; > SET > gpadmin=# select * from pg_resqueue where corelimit=90; > rsqname |