You can definitely do it in single user mode with a command like:

/path_to_hawq/bin/postgres --single -P -O -p 9999 -D /xxxx/hawq/segment -c
gp_session_role=utility template1

I was able to do this on the affected segments once I realized that
template1 was the local database HAWQ uses for parallel processing. Once in
there I was able to view the datfrozenxid and age on the hcatalog table of
the older segments. At that point, since we have Hive removed from the
cluster now I was curious to see what queries to hcatalog would return.

When I issued \dt hcatalog.*.* it returned all of the local schemas and
tables which I thought odd. When I would then try to issue a \d
hcatalog.template1.some_table it would time out with the expected Hive
metastore service could not be contacted. So that made me realize that the
xid reported for the hcatalog database was essentially meaningless.

With that, I performed a test of updating the datfrozenxid to that of
template1 since I was not concerned about wrap around or potential data
loss since there is no data in hcatalog. This, of course, is not ideal but
succeeded in eliminating the warnings.

Lastly, I searched through the logs yesterday and found that it has been
trying to auto vacuum hcatalog for quite sometime unsuccessfully. The log
entry was:

"ERROR","3D000","""hcatalog"" database is only for system
use",,,,,,,0,,"postinit.c",386

which leads to the source:
https://github.com/apache/hawq/blob/master/src/backend/utils/init/postinit.c#L376
where they state it is specifically blocked as it is an integration. I get
that but autovacuum is a system process and would expect that function to
succeed and this problem to be prevalent in any cluster that used the
hcatalog integration that reached the limit?

We have an extremely active cluster and reached two billion transactions in
under a year. It would be unfortunate to have go through this maintenance
routinely. I look forward to more feedback into my work around or any other
explanations that would help alleviate this from coming up again in the
future.

John Glorioso
Lead Engineer
[email protected]




On Thu, Oct 18, 2018 at 5:34 AM Aitor Cedres <[email protected]> wrote:

> Hi John,
>
> The problem seems to be that database 'hcatalog' has connections disabled
> in pg_database. If you run 'select * from pg_database', the column
> 'datallowconn' should be 'f'. I think a regular user is not allowed to make
> modification to the system tables. There was a hidden GUC to allow DML
> operations on the catalog, but I can't remember from the top of my head.
>
> If you manage to get DML modification on the catalog, running something
> like `UPDATE pg_database SET datallowconn = 't' where datname = 'hcatalog'`
> should allow you to access the database and run VACUUM FREEZE.
>
> Best regards,
> Aitor
>
> On 17/10/2018 20:26, John Glorioso wrote:
>
> Digging deeper on this, I found that HAWQ has been trying to auto vacuum
> hcatalog for months now with the following error:
>
> 2018-09-16 14:19:01.646399
> UTC,"xxx","yyy",p252354,th493275424,"11.22.33.44","58083",2018-09-16
> 14:18:57
> UTC,0,con433331,cmd114,seg0,slice1,,,sx1,"LOG","00000","GetNewTransactionId:
> requesting autovac (xid 1932591104 xidVacLimit 200000003)",,,,,,"select
> Ta.client_commissioned_measurement_methodology as v,
> count(Ta.client_commissioned_measurement_methodology)::float4/588.000000::float4
> as f from my_schema.my_table as Ta where
> Ta.client_commissioned_measurement_methodology is not null group by
> (Ta.client_commissioned_measurement_methodology) order by f desc limit
> 25",0,,"varsup.c",98,
> 2018-09-16 14:19:01.663978
> UTC,,,p252782,th493275424,,,,0,,,seg-10000,,,,,"ERROR","3D000","""hcatalog""
> database is only for system use",,,,,,,0,,"postinit.c",386,
>
> hcatalog being a system database (and one that isn't even real
> technically) seems to be causing the issue. I have successfully vacuumed
> the others.
>
> John Glorioso
> Lead Engineer
> [email protected]
>
>
>
>
> On Wed, Oct 17, 2018 at 9:19 AM John Glorioso <[email protected]> wrote:
>
>> I believe the issue is the FREEZE vacuum option. Is there a setting in
>> config that I can verify? Better yet since we no longer use hcatalog to
>> connect to hive is there a way to just simply disable or clear
>> datfrozenxid? I don't imagine dropping hcatalog would work and I am
>> hesitant to try to manually update that value to null.
>>
>> -John
>>
>> On Wed, Oct 17, 2018, 1:49 AM Paul Guo <[email protected]> wrote:
>>
>>> Looks like HAWQ should do auto-vacuum for those databases which DBA can
>>> not connect?
>>>
>>>
>>> John Glorioso <[email protected]> 于2018年10月17日周三 上午6:32写道:
>>>
>>>> We are running a decent sized cluster (20 nodes) and have started
>>>> receiving the following error today:
>>>>
>>>> WARNING:  database "hcatalog" must be vacuumed within 42324912
>>>> transactions  (seg27 worker09.xxx.com:40000 pid=689628)
>>>> HINT:  To avoid a database shutdown, execute a full-database VACUUM in
>>>> "hcatalog".
>>>>
>>>> I found this Pivotal article:
>>>>
>>>> https://community.pivotal.io/s/article/About-XIDs-and-XID-Wraparound-in-Greenplum
>>>> which discusses xid wrap around and that led me to running a query
>>>> which identified tables in each database that need a vacuum freeze run
>>>> against it.
>>>>
>>>>
>>>> https://community.pivotal.io/s/article/Vacuuming-Old-Tables-in-a-Database-to-Reduce-its-Age
>>>>
>>>> The issue, however, is that it is complaining about the hcatalog
>>>> database which is not an accessible database as it was used to query data
>>>> in Hive at one point. We have since disabled Hive and no longer need that
>>>> external PXF link but I am unclear on how to remove it.
>>>>
>>>> This article shows how to detect database age and I can run the first
>>>> part but it is for Greenplum and I do not know the equivalent properties in
>>>> HAWQ for the case statement to work.
>>>>
>>>>
>>>> https://community.pivotal.io/s/article/How-to-check-database-age-in-Greenplum
>>>>
>>>> The results for the following sql statement are below:
>>>>
>>>> SELECT gp_segment_id, datname, age(datfrozenxid) age FROM pg_database
>>>> UNION ALL
>>>> SELECT gp_segment_id, datname, age(datfrozenxid) age FROM
>>>> gp_dist_random('pg_database');
>>>>
>>>>  gp_segment_id |  datname   |    age
>>>> ---------------+------------+------------
>>>>             -1 | hcatalog   |  212776719
>>>>             -1 | template1  |      39104
>>>>             -1 | template0  |  212775841
>>>>             -1 | production |   10604565
>>>>             -1 | postgres   |      39104
>>>>             -1 | gpadmin    |      39104
>>>>              0 | template1  | 2100536422
>>>>              0 | hcatalog   | 2100537300
>>>> (8 rows)
>>>>
>>>> I cannot access hcatalog or template1 to issue vacuum freeze statements
>>>> against the tables and am unsure how to proceed.
>>>>
>>>> Any help is greatly appreciated!
>>>>
>>>> John Glorioso
>>>> Lead Engineer
>>>> [email protected]
>>>>
>>>>
>>>>
>

Reply via email to