date:20170219

[jira] [Updated] (HAWQ-1339) Cache lookup failed after explain OLAP grouping query

2017-02-19 Thread Amy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HAWQ-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amy updated HAWQ-1339:
--
Fix Version/s: (was: 2.2.0.0-incubating)
   backlog

> Cache lookup failed after explain OLAP grouping query
> -
>
> Key: HAWQ-1339
> URL: https://issues.apache.org/jira/browse/HAWQ-1339
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Amy
>Assignee: Amy
> Fix For: backlog
>
> Attachments: olap_setup.sql
>
>
> Some OLAP grouping query may error out with "division by zero", and when do 
> query explain, notice of "cache lookup failed for attribute 7 of relation 
> 75036 (lsyscache.c:437)" occurred.
> {code}
> postgres=# SELECT sale.vn,sale.cn,sale.dt,GROUPING(sale.vn), 
> TO_CHAR(COALESCE(MAX(DISTINCT 
> floor(sale.vn+sale.qty)),0),'.999'),TO_CHAR(COALESCE(VAR_SAMP(floor(sale.pn/sale.prc)),0),'.999'),TO_CHAR(COALESCE(COUNT(floor(sale.qty+sale.prc)),0),'.999')
> postgres-# FROM sale,customer,vendor
> postgres-# WHERE sale.cn=customer.cn AND sale.vn=vendor.vn
> postgres-# GROUP BY 
> ROLLUP((sale.prc),(sale.vn,sale.vn),(sale.pn,sale.pn),(sale.dt),(sale.qty,sale.vn,sale.qty)),ROLLUP((sale.pn),(sale.vn,sale.pn),(sale.qty)),(),sale.cn
>  HAVING COALESCE(VAR_POP(sale.cn),0) >= 45.5839785564113;
> ERROR:  division by zero  (seg0 localhost:4 pid=25205)
> postgres=#
> postgres=# explain SELECT sale.vn,sale.cn,sale.dt,GROUPING(sale.vn), 
> TO_CHAR(COALESCE(MAX(DISTINCT 
> floor(sale.vn+sale.qty)),0),'.999'),TO_CHAR(COALESCE(VAR_SAMP(floor(sale.pn/sale.prc)),0),'.999'),TO_CHAR(COALESCE(COUNT(floor(sale.qty+sale.prc)),0),'.999')
> FROM sale,customer,vendor
> WHERE sale.cn=customer.cn AND sale.vn=vendor.vn
> GROUP BY 
> ROLLUP((sale.prc),(sale.vn,sale.vn),(sale.pn,sale.pn),(sale.dt),(sale.qty,sale.vn,sale.qty)),ROLLUP((sale.pn),(sale.vn,sale.pn),(sale.qty)),(),sale.cn
>  HAVING COALESCE(VAR_POP(sale.cn),0) >= 45.5839785564113;
> NOTICE:  cache lookup failed for attribute 7 of relation 75036 
> (lsyscache.c:437)
> {code}
> The reproduction steps are:
> {code}
> Step 1: Prepare schema and data using attached olap_setup.sql
> Step 2: Run below OLAP grouping query
> -- OLAP query involving MAX() function
> SELECT sale.vn,sale.cn,sale.dt,GROUPING(sale.vn), 
> TO_CHAR(COALESCE(MAX(DISTINCT 
> floor(sale.vn+sale.qty)),0),'.999'),TO_CHAR(COALESCE(VAR_SAMP(floor(sale.pn/sale.prc)),0),'.999'),TO_CHAR(COALESCE(COUNT(floor(sale.qty+sale.prc)),0),'.999')
> FROM sale,customer,vendor
> WHERE sale.cn=customer.cn AND sale.vn=vendor.vn
> GROUP BY 
> ROLLUP((sale.prc),(sale.vn,sale.vn),(sale.pn,sale.pn),(sale.dt),(sale.qty,sale.vn,sale.qty)),ROLLUP((sale.pn),(sale.vn,sale.pn),(sale.qty)),(),sale.cn
>  HAVING COALESCE(VAR_POP(sale.cn),0) >= 45.5839785564113;
> explain SELECT sale.vn,sale.cn,sale.dt,GROUPING(sale.vn), 
> TO_CHAR(COALESCE(MAX(DISTINCT 
> floor(sale.vn+sale.qty)),0),'.999'),TO_CHAR(COALESCE(VAR_SAMP(floor(sale.pn/sale.prc)),0),'.999'),TO_CHAR(COALESCE(COUNT(floor(sale.qty+sale.prc)),0),'.999')
> FROM sale,customer,vendor
> WHERE sale.cn=customer.cn AND sale.vn=vendor.vn
> GROUP BY 
> ROLLUP((sale.prc),(sale.vn,sale.vn),(sale.pn,sale.pn),(sale.dt),(sale.qty,sale.vn,sale.qty)),ROLLUP((sale.pn),(sale.vn,sale.pn),(sale.qty)),(),sale.cn
>  HAVING COALESCE(VAR_POP(sale.cn),0) >= 45.5839785564113;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HAWQ-1342) QE process hang in shared input scan on segment node

2017-02-19 Thread Ming LI (JIRA)


 [ 
https://issues.apache.org/jira/browse/HAWQ-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming LI updated HAWQ-1342:
--
Affects Version/s: 2.0.0.0-incubating

> QE process hang in shared input scan on segment node
> 
>
> Key: HAWQ-1342
> URL: https://issues.apache.org/jira/browse/HAWQ-1342
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Affects Versions: 2.0.0.0-incubating
>Reporter: Amy
>Assignee: Amy
> Fix For: backlog
>
>
> QE process hang on some segment node while QD and QE on other segment nodes 
> terminated.
> {code}
> [gpadmin@test1 ~]$ cat hostfile
> test1   master   secondary namenode
> test2   segment   datanode
> test3   segment   datanode
> test4   segment   datanode
> test5   segment   namenode
> [gpadmin@test3 ~]$ ps -ef | grep postgres | grep -v grep
> gpadmin   41877  1  0 05:35 ?00:01:04 
> /usr/local/hawq_2_1_0_0/bin/postgres -D 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-Multinode-parallel/product/segmentdd
>  -i -M segment -p 20100 --silent-mode=true
> gpadmin   41878  41877  0 05:35 ?00:00:02 postgres: port 20100, 
> logger process
> gpadmin   41881  41877  0 05:35 ?00:00:00 postgres: port 20100, stats 
> collector process
> gpadmin   41882  41877  0 05:35 ?00:00:07 postgres: port 20100, 
> writer process
> gpadmin   41883  41877  0 05:35 ?00:00:01 postgres: port 20100, 
> checkpoint process
> gpadmin   41884  41877  0 05:35 ?00:00:11 postgres: port 20100, 
> segment resource manager
> gpadmin   42108  41877  0 05:35 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(65193) con35 seg0 cmd2 slice9 MPPEXEC 
> SELECT
> gpadmin   42416  41877  0 05:35 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(65359) con53 seg0 cmd2 slice11 MPPEXEC 
> SELECT
> gpadmin   44807  41877  0 05:36 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(2272) con183 seg0 cmd2 slice31 MPPEXEC 
> SELECT
> gpadmin   44819  41877  0 05:36 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(2278) con183 seg0 cmd2 slice10 MPPEXEC 
> SELECT
> gpadmin   44821  41877  0 05:36 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(2279) con183 seg0 cmd2 slice25 MPPEXEC 
> SELECT
> gpadmin   45447  41877  0 05:36 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(2605) con207 seg0 cmd2 slice9 MPPEXEC 
> SELECT
> gpadmin   49859  41877  0 05:38 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(4805) con432 seg0 cmd2 slice20 MPPEXEC 
> SELECT
> gpadmin   49881  41877  0 05:38 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(4816) con432 seg0 cmd2 slice7 MPPEXEC 
> SELECT
> gpadmin   51937  41877  0 05:39 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(5877) con517 seg0 cmd2 slice7 MPPEXEC 
> SELECT
> gpadmin   51939  41877  0 05:39 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(5878) con517 seg0 cmd2 slice9 MPPEXEC 
> SELECT
> gpadmin   51941  41877  0 05:39 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(5879) con517 seg0 cmd2 slice11 MPPEXEC 
> SELECT
> gpadmin   51943  41877  0 05:39 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(5880) con517 seg0 cmd2 slice13 MPPEXEC 
> SELECT
> gpadmin   51953  41877  0 05:39 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(5885) con517 seg0 cmd2 slice26 MPPEXEC 
> SELECT
> gpadmin   53436  41877  0 05:40 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(6634) con602 seg0 cmd2 slice15 MPPEXEC 
> SELECT
> gpadmin   57095  41877  0 05:41 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(8450) con782 seg0 cmd2 slice10 MPPEXEC 
> SELECT
> gpadmin   57097  41877  0 05:41 ?00:00:04 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(8451) con782 seg0 cmd2 slice11 MPPEXEC 
> SELECT
> gpadmin   63159  41877  0 05:43 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(11474) con1082 seg0 cmd2 slice15 
> MPPEXEC SELECT
> gpadmin   64018  41877  0 05:44 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_group 10.32.35.192(11905) con1121 seg0 cmd2 slice5 MPPEXEC 
> SELECT
> {code}
> The stack info is as below and it seems that QE hang in shared input scan.
> {code}
> [gpadmin@test3 ~]$ gdb -p 42108
> (gdb) info threads
>   2 Thread 0x7f4f6b335700 (LWP 42109)  0x0032214df283 in poll () from 
> /lib64/libc.so.6
> * 1 Thread 0x7f4f9041c920 (LWP 42108)  0x0032214e1523 in select

[jira] [Updated] (HAWQ-1345) Cannot connect to PSQL: FATAL: could not count blocks of relation 1663/16508/1249: Not a directory

2017-02-19 Thread Ming LI (JIRA)


 [ 
https://issues.apache.org/jira/browse/HAWQ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming LI updated HAWQ-1345:
--
Affects Version/s: 2.0.0.0-incubating

> Cannot connect to PSQL: FATAL: could not count blocks of relation 
> 1663/16508/1249: Not a directory
> --
>
> Key: HAWQ-1345
> URL: https://issues.apache.org/jira/browse/HAWQ-1345
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: 2.0.0.0-incubating
>Reporter: Amy
>Assignee: Ming LI
> Fix For: backlog
>
>
> Unable to connect to psql for current database. 
> We can access psql for template1 database but for current database we are 
> getting the following error:
> {code}
> #psql 
> psql: FATAL: could not count blocks of relation 1663/16508/1249: Not a 
> directory
> {code}
> When trying to failover to Standby and starting HAWQ Master we get the 
> following error again:
> {code}
> 2017-02-17 02:12:50.119207 
> PST,,,p22482,th-16818971840,,,seg-1,"DEBUG1","0","opening 
> ""pg_xlog/00010005001D"" for readin
> g (log 5, seg 29)",,,0,,"xlog.c",3162,
> 2017-02-17 02:12:50.176450 
> PST,,,p22482,th-16818971840,,,seg-1,"FATAL","42809","could not 
> count blocks of relation 1663/16508/1249: Not
> a directory","xlog redo insert: rel 1663/16508/1249; tid 32682/85
> REDO PASS 3 @ 5/7669B838; LSN 5/7669E480: prev 5/76694C98; xid 825193; bkpb1: 
> Heap - insert: rel 1663/16508/1249; tid 32682/85",,0,,"smgr.c",1146,"
> Stack trace:
> 10x8c5628 postgres errstart + 0x288
> 20x7ddfbc postgres smgrnblocks + 0x3c
> 30x4fbdf8 postgres XLogReadBuffer + 0x18
> 40x4ea2c9 postgres  + 0x4ea2c9
> 50x4eaf47 postgres  + 0x4eaf47
> 60x4f8af3 postgres StartupXLOG_Pass3 + 0x153
> 70x4fb277 postgres StartupProcessMain + 0x187
> 80x557cd8 postgres AuxiliaryProcessMain + 0x478
> 90x793c40 postgres  + 0x793c40
> 10   0x798901 postgres  + 0x798901
> 11   0x79a8c9 postgres PostmasterMain + 0x759
> 12   0x4a4039 postgres main + 0x519
> 13   0x7f3b979e1d5d libc.so.6 __libc_start_main + 0xfd
> 14   0x4a40b9 postgres  + 0x4a40b9
> "
> {code}
> On both Master and Standby, we can see that pg_attribute for current 
> database, file 1663/16508/1249 has reached 1GB in size:
> {code}
> [gpadmin@master]$pwd
> /data/hawq/master
> [gpadmin@master master]$ cd  base
> [gpadmin@master base]$ ls
> 1  16386  16387  16508
> [gpadmin@master base]$ cd 16508
> [gpadmin@master 16508]$ ls -thrl 1249
> -rw--- 1 gpadmin gpadmin 1.0G Feb 16 18:24 1249
> {code}
> From strace we were able to find the following:
> {code}
> [gpadmin@master master]$ strace  /usr/local/hawq/bin/postgres --single -P -O 
> -p 5432 -D $MASTER_DATA_DIRECTORY -c gp_session_role=utility currentdatabase 
> < select version();
> EOF
> (...)
> open("base/16508/pg_internal.init", O_RDONLY) = -1 ENOENT (No such file or 
> directory)
> open("base/16508/1259", O_RDWR) = 6
> lseek(6, 0, SEEK_END)   = 188645376
> lseek(6, 0, SEEK_SET)   = 0
> read(6, 
> "\0\0\0\0\340\5\327\1\1\0\1\0\f\3@\3\0\200\4\2008\263P\1`\262\252\1\270\261P\1"...,
>  32768) = 32768
> open("base/16508/1249", O_RDWR) = 8
> lseek(8, 0, SEEK_END)   = 1073741824
> open("base/16508/1249/1", O_RDWR)   = -1 ENOTDIR (Not a directory)
> open("base/16508/1249/1", O_RDWR|O_CREAT, 0600) = -1 ENOTDIR (Not a directory)
> futex(0x7ff80e53f620, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> futex(0x7ff80e756af0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> open("/usr/share/locale/locale.alias", O_RDONLY) = 10
> fstat(10, {st_mode=S_IFREG|0644, st_size=2512, ...}) = 0
> {code}
> We see HAWQ is treating pg_attribute as a directory while it is a file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HAWQ-1346) If using WebHdfsFileSystem as default Filesytem, it will cause cast type exception

2017-02-19 Thread Tian Hong Wang (JIRA)

Tian Hong Wang created HAWQ-1346:


 Summary: If using WebHdfsFileSystem as default Filesytem, it will 
cause cast type exception
 Key: HAWQ-1346
 URL: https://issues.apache.org/jira/browse/HAWQ-1346
 Project: Apache HAWQ
  Issue Type: Bug
  Components: PXF
Reporter: Tian Hong Wang
Assignee: Ed Espino


In 
incubator-hawq/pxf/pxf-hdfs/src/main/java/org/apache/hawq/pxf/plugins/hdfs/ChunkRecordReader.java:

private DFSInputStream getInputStream() {
  return (DFSInputStream) (fileIn.getWrappedStream());
 }

If using WebHdfsFileSystem as default Filesytem, it will cause cast type 
exception;

java.lang.ClassCastException: 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream cannot be cast 
to org.apache.hadoop.hdfs.DFSInputStream
at 
org.apache.hawq.pxf.plugins.hdfs.ChunkRecordReader.getInputStream(ChunkRecordReader.java:76)
at 
org.apache.hawq.pxf.plugins.hdfs.ChunkRecordReader.(ChunkRecordReader.java:112)
at 
org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor.getReader(LineBreakAccessor.java:64)
at 
org.apache.hawq.pxf.plugins.hdfs.HdfsSplittableDataAccessor.getNextSplit(HdfsSplittableDataAccessor.java:114)
at 
org.apache.hawq.pxf.plugins.hdfs.HdfsSplittableDataAccessor.openForRead(HdfsSplittableDataAccessor.java:83)
at 
org.apache.hawq.pxf.service.ReadBridge.beginIteration(ReadBridge.java:73)
at 
org.apache.hawq.pxf.service.rest.BridgeResource$1.write(BridgeResource.java:132)
at 
com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:71)
at 
com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:57)
at 
com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at 
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HAWQ-1345) Cannot connect to PSQL: FATAL: could not count blocks of relation 1663/16508/1249: Not a directory

2017-02-19 Thread Ming LI (JIRA)


 [ 
https://issues.apache.org/jira/browse/HAWQ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming LI reassigned HAWQ-1345:
-

Assignee: Ming LI  (was: Amy)

> Cannot connect to PSQL: FATAL: could not count blocks of relation 
> 1663/16508/1249: Not a directory
> --
>
> Key: HAWQ-1345
> URL: https://issues.apache.org/jira/browse/HAWQ-1345
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Amy
>Assignee: Ming LI
> Fix For: 3.0.0.0
>
>
> Unable to connect to psql for current database. 
> We can access psql for template1 database but for current database we are 
> getting the following error:
> {code}
> #psql 
> psql: FATAL: could not count blocks of relation 1663/16508/1249: Not a 
> directory
> {code}
> When trying to failover to Standby and starting HAWQ Master we get the 
> following error again:
> {code}
> 2017-02-17 02:12:50.119207 
> PST,,,p22482,th-16818971840,,,seg-1,"DEBUG1","0","opening 
> ""pg_xlog/00010005001D"" for readin
> g (log 5, seg 29)",,,0,,"xlog.c",3162,
> 2017-02-17 02:12:50.176450 
> PST,,,p22482,th-16818971840,,,seg-1,"FATAL","42809","could not 
> count blocks of relation 1663/16508/1249: Not
> a directory","xlog redo insert: rel 1663/16508/1249; tid 32682/85
> REDO PASS 3 @ 5/7669B838; LSN 5/7669E480: prev 5/76694C98; xid 825193; bkpb1: 
> Heap - insert: rel 1663/16508/1249; tid 32682/85",,0,,"smgr.c",1146,"
> Stack trace:
> 10x8c5628 postgres errstart + 0x288
> 20x7ddfbc postgres smgrnblocks + 0x3c
> 30x4fbdf8 postgres XLogReadBuffer + 0x18
> 40x4ea2c9 postgres  + 0x4ea2c9
> 50x4eaf47 postgres  + 0x4eaf47
> 60x4f8af3 postgres StartupXLOG_Pass3 + 0x153
> 70x4fb277 postgres StartupProcessMain + 0x187
> 80x557cd8 postgres AuxiliaryProcessMain + 0x478
> 90x793c40 postgres  + 0x793c40
> 10   0x798901 postgres  + 0x798901
> 11   0x79a8c9 postgres PostmasterMain + 0x759
> 12   0x4a4039 postgres main + 0x519
> 13   0x7f3b979e1d5d libc.so.6 __libc_start_main + 0xfd
> 14   0x4a40b9 postgres  + 0x4a40b9
> "
> {code}
> On both Master and Standby, we can see that pg_attribute for current 
> database, file 1663/16508/1249 has reached 1GB in size:
> {code}
> [gpadmin@master]$pwd
> /data/hawq/master
> [gpadmin@master master]$ cd  base
> [gpadmin@master base]$ ls
> 1  16386  16387  16508
> [gpadmin@master base]$ cd 16508
> [gpadmin@master 16508]$ ls -thrl 1249
> -rw--- 1 gpadmin gpadmin 1.0G Feb 16 18:24 1249
> {code}
> From strace we were able to find the following:
> {code}
> [gpadmin@master master]$ strace  /usr/local/hawq/bin/postgres --single -P -O 
> -p 5432 -D $MASTER_DATA_DIRECTORY -c gp_session_role=utility currentdatabase 
> < select version();
> EOF
> (...)
> open("base/16508/pg_internal.init", O_RDONLY) = -1 ENOENT (No such file or 
> directory)
> open("base/16508/1259", O_RDWR) = 6
> lseek(6, 0, SEEK_END)   = 188645376
> lseek(6, 0, SEEK_SET)   = 0
> read(6, 
> "\0\0\0\0\340\5\327\1\1\0\1\0\f\3@\3\0\200\4\2008\263P\1`\262\252\1\270\261P\1"...,
>  32768) = 32768
> open("base/16508/1249", O_RDWR) = 8
> lseek(8, 0, SEEK_END)   = 1073741824
> open("base/16508/1249/1", O_RDWR)   = -1 ENOTDIR (Not a directory)
> open("base/16508/1249/1", O_RDWR|O_CREAT, 0600) = -1 ENOTDIR (Not a directory)
> futex(0x7ff80e53f620, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> futex(0x7ff80e756af0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> open("/usr/share/locale/locale.alias", O_RDONLY) = 10
> fstat(10, {st_mode=S_IFREG|0644, st_size=2512, ...}) = 0
> {code}
> We see HAWQ is treating pg_attribute as a directory while it is a file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HAWQ-1345) Cannot connect to PSQL: FATAL: could not count blocks of relation 1663/16508/1249: Not a directory

2017-02-19 Thread Amy (JIRA)

Amy created HAWQ-1345:
-

 Summary: Cannot connect to PSQL: FATAL: could not count blocks of 
relation 1663/16508/1249: Not a directory
 Key: HAWQ-1345
 URL: https://issues.apache.org/jira/browse/HAWQ-1345
 Project: Apache HAWQ
  Issue Type: Bug
  Components: Catalog
Reporter: Amy
Assignee: Ed Espino
 Fix For: 3.0.0.0


Unable to connect to psql for current database. 
We can access psql for template1 database but for current database we are 
getting the following error:

{code}
#psql 
psql: FATAL: could not count blocks of relation 1663/16508/1249: Not a directory
{code}

When trying to failover to Standby and starting HAWQ Master we get the 
following error again:
{code}
2017-02-17 02:12:50.119207 
PST,,,p22482,th-16818971840,,,seg-1,"DEBUG1","0","opening 
""pg_xlog/00010005001D"" for readin
g (log 5, seg 29)",,,0,,"xlog.c",3162,
2017-02-17 02:12:50.176450 
PST,,,p22482,th-16818971840,,,seg-1,"FATAL","42809","could not 
count blocks of relation 1663/16508/1249: Not
a directory","xlog redo insert: rel 1663/16508/1249; tid 32682/85
REDO PASS 3 @ 5/7669B838; LSN 5/7669E480: prev 5/76694C98; xid 825193; bkpb1: 
Heap - insert: rel 1663/16508/1249; tid 32682/85",,0,,"smgr.c",1146,"
Stack trace:
10x8c5628 postgres errstart + 0x288
20x7ddfbc postgres smgrnblocks + 0x3c
30x4fbdf8 postgres XLogReadBuffer + 0x18
40x4ea2c9 postgres  + 0x4ea2c9
50x4eaf47 postgres  + 0x4eaf47
60x4f8af3 postgres StartupXLOG_Pass3 + 0x153
70x4fb277 postgres StartupProcessMain + 0x187
80x557cd8 postgres AuxiliaryProcessMain + 0x478
90x793c40 postgres  + 0x793c40
10   0x798901 postgres  + 0x798901
11   0x79a8c9 postgres PostmasterMain + 0x759
12   0x4a4039 postgres main + 0x519
13   0x7f3b979e1d5d libc.so.6 __libc_start_main + 0xfd
14   0x4a40b9 postgres  + 0x4a40b9
"
{code}

On both Master and Standby, we can see that pg_attribute for current database, 
file 1663/16508/1249 has reached 1GB in size:

{code}
[gpadmin@master]$pwd
/data/hawq/master
[gpadmin@master master]$ cd  base
[gpadmin@master base]$ ls
1  16386  16387  16508
[gpadmin@master base]$ cd 16508
[gpadmin@master 16508]$ ls -thrl 1249
-rw--- 1 gpadmin gpadmin 1.0G Feb 16 18:24 1249
{code}

>From strace we were able to find the following:

{code}
[gpadmin@master master]$ strace  /usr/local/hawq/bin/postgres --single -P -O -p 
5432 -D $MASTER_DATA_DIRECTORY -c gp_session_role=utility currentdatabase <

[GitHub] incubator-hawq issue #1119: HAWQ-1328. Add deny and exclude policy template ...

2017-02-19 Thread zhangh43

Github user zhangh43 commented on the issue:

https://github.com/apache/incubator-hawq/pull/1119
  
Thanks @denalex , the quote typo is fixed.
According to the Ranger document, Ranger will traverse all the deny 
conditions firstly to determine whether to disallow the access. And then 
traverse all the positive conditions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (HAWQ-1339) Cache lookup failed after explain OLAP grouping query

[jira] [Updated] (HAWQ-1342) QE process hang in shared input scan on segment node

[jira] [Updated] (HAWQ-1345) Cannot connect to PSQL: FATAL: could not count blocks of relation 1663/16508/1249: Not a directory

[jira] [Created] (HAWQ-1346) If using WebHdfsFileSystem as default Filesytem, it will cause cast type exception

[jira] [Assigned] (HAWQ-1345) Cannot connect to PSQL: FATAL: could not count blocks of relation 1663/16508/1249: Not a directory

[jira] [Created] (HAWQ-1345) Cannot connect to PSQL: FATAL: could not count blocks of relation 1663/16508/1249: Not a directory

[GitHub] incubator-hawq issue #1119: HAWQ-1328. Add deny and exclude policy template ...

7 matches

Site Navigation

Mail list logo

Footer information