[jira] [Resolved] (HAWQ-1345) Cannot connect to PSQL: FATAL: could not count blocks of relation 1663/16508/1249: Not a directory

2017-02-26 Thread Amy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amy resolved HAWQ-1345.
---
Resolution: Fixed

> Cannot connect to PSQL: FATAL: could not count blocks of relation 
> 1663/16508/1249: Not a directory
> --
>
> Key: HAWQ-1345
> URL: https://issues.apache.org/jira/browse/HAWQ-1345
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: 2.0.0.0-incubating
>Reporter: Amy
>Assignee: Ming LI
> Fix For: 2.1.0.0-incubating
>
>
> Unable to connect to psql for current database. 
> We can access psql for template1 database but for current database we are 
> getting the following error:
> {code}
> #psql 
> psql: FATAL: could not count blocks of relation 1663/16508/1249: Not a 
> directory
> {code}
> When trying to failover to Standby and starting HAWQ Master we get the 
> following error again:
> {code}
> 2017-02-17 02:12:50.119207 
> PST,,,p22482,th-16818971840,,,seg-1,"DEBUG1","0","opening 
> ""pg_xlog/00010005001D"" for readin
> g (log 5, seg 29)",,,0,,"xlog.c",3162,
> 2017-02-17 02:12:50.176450 
> PST,,,p22482,th-16818971840,,,seg-1,"FATAL","42809","could not 
> count blocks of relation 1663/16508/1249: Not
> a directory","xlog redo insert: rel 1663/16508/1249; tid 32682/85
> REDO PASS 3 @ 5/7669B838; LSN 5/7669E480: prev 5/76694C98; xid 825193; bkpb1: 
> Heap - insert: rel 1663/16508/1249; tid 32682/85",,0,,"smgr.c",1146,"
> Stack trace:
> 10x8c5628 postgres errstart + 0x288
> 20x7ddfbc postgres smgrnblocks + 0x3c
> 30x4fbdf8 postgres XLogReadBuffer + 0x18
> 40x4ea2c9 postgres  + 0x4ea2c9
> 50x4eaf47 postgres  + 0x4eaf47
> 60x4f8af3 postgres StartupXLOG_Pass3 + 0x153
> 70x4fb277 postgres StartupProcessMain + 0x187
> 80x557cd8 postgres AuxiliaryProcessMain + 0x478
> 90x793c40 postgres  + 0x793c40
> 10   0x798901 postgres  + 0x798901
> 11   0x79a8c9 postgres PostmasterMain + 0x759
> 12   0x4a4039 postgres main + 0x519
> 13   0x7f3b979e1d5d libc.so.6 __libc_start_main + 0xfd
> 14   0x4a40b9 postgres  + 0x4a40b9
> "
> {code}
> On both Master and Standby, we can see that pg_attribute for current 
> database, file 1663/16508/1249 has reached 1GB in size:
> {code}
> [gpadmin@master]$pwd
> /data/hawq/master
> [gpadmin@master master]$ cd  base
> [gpadmin@master base]$ ls
> 1  16386  16387  16508
> [gpadmin@master base]$ cd 16508
> [gpadmin@master 16508]$ ls -thrl 1249
> -rw--- 1 gpadmin gpadmin 1.0G Feb 16 18:24 1249
> {code}
> From strace we were able to find the following:
> {code}
> [gpadmin@master master]$ strace  /usr/local/hawq/bin/postgres --single -P -O 
> -p 5432 -D $MASTER_DATA_DIRECTORY -c gp_session_role=utility currentdatabase 
> < select version();
> EOF
> (...)
> open("base/16508/pg_internal.init", O_RDONLY) = -1 ENOENT (No such file or 
> directory)
> open("base/16508/1259", O_RDWR) = 6
> lseek(6, 0, SEEK_END)   = 188645376
> lseek(6, 0, SEEK_SET)   = 0
> read(6, 
> "\0\0\0\0\340\5\327\1\1\0\1\0\f\3@\3\0\200\4\2008\263P\1`\262\252\1\270\261P\1"...,
>  32768) = 32768
> open("base/16508/1249", O_RDWR) = 8
> lseek(8, 0, SEEK_END)   = 1073741824
> open("base/16508/1249/1", O_RDWR)   = -1 ENOTDIR (Not a directory)
> open("base/16508/1249/1", O_RDWR|O_CREAT, 0600) = -1 ENOTDIR (Not a directory)
> futex(0x7ff80e53f620, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> futex(0x7ff80e756af0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> open("/usr/share/locale/locale.alias", O_RDONLY) = 10
> fstat(10, {st_mode=S_IFREG|0644, st_size=2512, ...}) = 0
> {code}
> We see HAWQ is treating pg_attribute as a directory while it is a file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HAWQ-1345) Cannot connect to PSQL: FATAL: could not count blocks of relation 1663/16508/1249: Not a directory

2017-02-22 Thread Ming LI (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming LI resolved HAWQ-1345.
---
Resolution: Fixed

> Cannot connect to PSQL: FATAL: could not count blocks of relation 
> 1663/16508/1249: Not a directory
> --
>
> Key: HAWQ-1345
> URL: https://issues.apache.org/jira/browse/HAWQ-1345
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: 2.0.0.0-incubating
>Reporter: Amy
>Assignee: Ming LI
> Fix For: backlog
>
>
> Unable to connect to psql for current database. 
> We can access psql for template1 database but for current database we are 
> getting the following error:
> {code}
> #psql 
> psql: FATAL: could not count blocks of relation 1663/16508/1249: Not a 
> directory
> {code}
> When trying to failover to Standby and starting HAWQ Master we get the 
> following error again:
> {code}
> 2017-02-17 02:12:50.119207 
> PST,,,p22482,th-16818971840,,,seg-1,"DEBUG1","0","opening 
> ""pg_xlog/00010005001D"" for readin
> g (log 5, seg 29)",,,0,,"xlog.c",3162,
> 2017-02-17 02:12:50.176450 
> PST,,,p22482,th-16818971840,,,seg-1,"FATAL","42809","could not 
> count blocks of relation 1663/16508/1249: Not
> a directory","xlog redo insert: rel 1663/16508/1249; tid 32682/85
> REDO PASS 3 @ 5/7669B838; LSN 5/7669E480: prev 5/76694C98; xid 825193; bkpb1: 
> Heap - insert: rel 1663/16508/1249; tid 32682/85",,0,,"smgr.c",1146,"
> Stack trace:
> 10x8c5628 postgres errstart + 0x288
> 20x7ddfbc postgres smgrnblocks + 0x3c
> 30x4fbdf8 postgres XLogReadBuffer + 0x18
> 40x4ea2c9 postgres  + 0x4ea2c9
> 50x4eaf47 postgres  + 0x4eaf47
> 60x4f8af3 postgres StartupXLOG_Pass3 + 0x153
> 70x4fb277 postgres StartupProcessMain + 0x187
> 80x557cd8 postgres AuxiliaryProcessMain + 0x478
> 90x793c40 postgres  + 0x793c40
> 10   0x798901 postgres  + 0x798901
> 11   0x79a8c9 postgres PostmasterMain + 0x759
> 12   0x4a4039 postgres main + 0x519
> 13   0x7f3b979e1d5d libc.so.6 __libc_start_main + 0xfd
> 14   0x4a40b9 postgres  + 0x4a40b9
> "
> {code}
> On both Master and Standby, we can see that pg_attribute for current 
> database, file 1663/16508/1249 has reached 1GB in size:
> {code}
> [gpadmin@master]$pwd
> /data/hawq/master
> [gpadmin@master master]$ cd  base
> [gpadmin@master base]$ ls
> 1  16386  16387  16508
> [gpadmin@master base]$ cd 16508
> [gpadmin@master 16508]$ ls -thrl 1249
> -rw--- 1 gpadmin gpadmin 1.0G Feb 16 18:24 1249
> {code}
> From strace we were able to find the following:
> {code}
> [gpadmin@master master]$ strace  /usr/local/hawq/bin/postgres --single -P -O 
> -p 5432 -D $MASTER_DATA_DIRECTORY -c gp_session_role=utility currentdatabase 
> < select version();
> EOF
> (...)
> open("base/16508/pg_internal.init", O_RDONLY) = -1 ENOENT (No such file or 
> directory)
> open("base/16508/1259", O_RDWR) = 6
> lseek(6, 0, SEEK_END)   = 188645376
> lseek(6, 0, SEEK_SET)   = 0
> read(6, 
> "\0\0\0\0\340\5\327\1\1\0\1\0\f\3@\3\0\200\4\2008\263P\1`\262\252\1\270\261P\1"...,
>  32768) = 32768
> open("base/16508/1249", O_RDWR) = 8
> lseek(8, 0, SEEK_END)   = 1073741824
> open("base/16508/1249/1", O_RDWR)   = -1 ENOTDIR (Not a directory)
> open("base/16508/1249/1", O_RDWR|O_CREAT, 0600) = -1 ENOTDIR (Not a directory)
> futex(0x7ff80e53f620, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> futex(0x7ff80e756af0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> open("/usr/share/locale/locale.alias", O_RDONLY) = 10
> fstat(10, {st_mode=S_IFREG|0644, st_size=2512, ...}) = 0
> {code}
> We see HAWQ is treating pg_attribute as a directory while it is a file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)