Hi! After i call "hawq stop cluster -a", i found that there is still has gpadmin process:
gpadmin 61866 0.4 5.2 811448 419620 ? S 17:29 0:00 /usr/local/apache-hawq/bin/gpsyncmaster -D /data/hawq/masterdd -i -p 1809 gpadmin 61882 0.0 0.0 302688 7200 ? Ss 17:29 0:00 postgres: port 1809, logger process gpadmin 61883 0.0 0.0 812000 7384 ? S 17:29 0:00 postgres: port 1809, WAL Redo Server process gpadmin 61907 0.0 0.1 812300 8128 ? Ss 17:29 0:00 postgres: port 1809, gpsyncagent process con2 idle Then I call "hawq start cluster -a" failed: 20181030:17:29:05:2789929 hawq_start:dx-computing:gpadmin-[INFO]:-Starting standby master '192.168.10.18' 20181030:17:29:05:2789929 hawq_start:dx-computing:gpadmin-[INFO]:-Start standby master service 20181030:17:29:04:061879 hawqstandbywatch.py:dx-computing2:gpadmin-[INFO]:-Checking standby master status 20181030:17:29:04:061879 hawqstandbywatch.py:dx-computing2:gpadmin-[INFO]:-Monitoring logs 20181030:17:29:08:061879 hawqstandbywatch.py:dx-computing2:gpadmin-[INFO]:-checking if syncmaster is running 20181030:17:29:08:061879 hawqstandbywatch.py:dx-computing2:gpadmin-[INFO]:-syncmaster appears ok, pid 61866 20181030:17:29:09:2789929 hawq_start:dx-computing:gpadmin-[INFO]:-Standby master started successfully 20181030:17:29:09:2789929 hawq_start:dx-computing:gpadmin-[INFO]:-Starting master node '192.168.10.17' 20181030:17:29:09:2789929 hawq_start:dx-computing:gpadmin-[INFO]:-Start master service 20181030:17:29:10:2789929 hawq_start:dx-computing:gpadmin-[INFO]:-Checking if standby is synced with master 20181030:17:29:10:2789929 hawq_start:dx-computing:gpadmin-[ERROR]:-Failed to connect to database, this script can only be run when the database is up Traceback (most recent call last): File "/usr/local/apache-hawq/bin/hawq_ctl", line 1459, in <module> start_hawq(opts, hawq_dict) File "/usr/local/apache-hawq/bin/hawq_ctl", line 1233, in start_hawq instance.run() File "/usr/local/apache-hawq/bin/hawq_ctl", line 765, in run check_return_code(self._start_all_nodes()) File "/usr/local/apache-hawq/bin/hawq_ctl", line 701, in _start_all_nodes check_return_code(self.start_master(), logger, "Master start failed, exit", \ File "/usr/local/apache-hawq/bin/hawq_ctl", line 618, in start_master sync_result = self._check_standby_sync() File "/usr/local/apache-hawq/bin/hawq_ctl", line 671, in _check_standby_sync for row in rows: UnboundLocalError: local variable 'rows' referenced before assignment So, why stop cluster can not stop gpsyncmaster on standby node? I use hawq 2.2, upgrade can solve it?
