[jira] [Closed] (HAWQ-1379) Do not send options multiple times in build_startup_packet()

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo closed HAWQ-1379.
-

> Do not send options multiple times in build_startup_packet()
> 
>
> Key: HAWQ-1379
> URL: https://issues.apache.org/jira/browse/HAWQ-1379
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 2.1.0.0-incubating
>Reporter: Paul Guo
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>
> build_startup_packet() build a libpq packet, however it includes 
> conn->pgoptions more than 1 time - this is is unnecessary and really wastes 
> network bandwidth.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HAWQ-1379) Do not send options multiple times in build_startup_packet()

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1379:
--
Component/s: Core

> Do not send options multiple times in build_startup_packet()
> 
>
> Key: HAWQ-1379
> URL: https://issues.apache.org/jira/browse/HAWQ-1379
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 2.1.0.0-incubating
>Reporter: Paul Guo
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>
> build_startup_packet() build a libpq packet, however it includes 
> conn->pgoptions more than 1 time - this is is unnecessary and really wastes 
> network bandwidth.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HAWQ-1379) Do not send options multiple times in build_startup_packet()

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1379:
--
Affects Version/s: 2.1.0.0-incubating

> Do not send options multiple times in build_startup_packet()
> 
>
> Key: HAWQ-1379
> URL: https://issues.apache.org/jira/browse/HAWQ-1379
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 2.1.0.0-incubating
>Reporter: Paul Guo
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>
> build_startup_packet() build a libpq packet, however it includes 
> conn->pgoptions more than 1 time - this is is unnecessary and really wastes 
> network bandwidth.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HAWQ-1379) Do not send options multiple times in build_startup_packet()

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo resolved HAWQ-1379.
---
Resolution: Fixed

> Do not send options multiple times in build_startup_packet()
> 
>
> Key: HAWQ-1379
> URL: https://issues.apache.org/jira/browse/HAWQ-1379
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 2.1.0.0-incubating
>Reporter: Paul Guo
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>
> build_startup_packet() build a libpq packet, however it includes 
> conn->pgoptions more than 1 time - this is is unnecessary and really wastes 
> network bandwidth.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HAWQ-1408) PANICs during COPY ... FROM STDIN

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1408:
--
Affects Version/s: (was: backlog)
   2.1.0.0-incubating

> PANICs during COPY ... FROM STDIN
> -
>
> Key: HAWQ-1408
> URL: https://issues.apache.org/jira/browse/HAWQ-1408
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 2.1.0.0-incubating
>Reporter: Ming LI
>Assignee: Ming LI
> Fix For: 2.2.0.0-incubating
>
>
> We found PANIC (and respective core dumps). From the initial analysis from 
> the logs and core dump, the query causing this PANIC is a "COPY ... FROM 
> STDIN". This query does not always panic.
> This kind of queries are executed from Java/Scala code (by one of IG Spark 
> Jobs). Connection to the DB is managed by connection pool (commons-dbcp2) and 
> validated on borrow by “select 1” validation query. IG is using 
> postgresql-9.4-1206-jdbc41 as a java driver to create those connections. I 
> believe they should be using the driver from DataDirect, available in PivNet; 
> however, I haven't found hard evidence pointing the driver as a root cause.
> My initial analysis on the packcore for the master PANIC. Not sure if this 
> helps or makes sense.
> This is the backtrace of the packcore for process 466858:
> {code}
> (gdb) bt
> #0  0x7fd875f906ab in raise () from 
> /data/logs/52280/packcore-core.postgres.466858/lib64/libpthread.so.0
> #1  0x008c0b19 in SafeHandlerForSegvBusIll (postgres_signal_arg=11, 
> processName=) at elog.c:4519
> #2  
> #3  0x0053b9c3 in SetSegnoForWrite (existing_segnos=0x4c46ff0, 
> existing_segnos@entry=0x0, relid=relid@entry=1195061, 
> segment_num=segment_num@entry=6, forNewRel=forNewRel@entry=0 '\000', 
> keepHash=keepHash@entry=1 '\001') at appendonlywriter.c:1166
> #4  0x0053c08f in assignPerRelSegno 
> (all_relids=all_relids@entry=0x2b96d68, segment_num=6) at 
> appendonlywriter.c:1212
> #5  0x005f79e8 in DoCopy (stmt=stmt@entry=0x2b2a3d8, 
> queryString=) at copy.c:1591
> #6  0x007ef737 in ProcessUtility 
> (parsetree=parsetree@entry=0x2b2a3d8, queryString=0x2c2f550 "COPY 
> mis_data_ig_client_derived_attributes.client_derived_attributes_src (id, 
> tracking_id, name, value_string, value_timestamp, value_number, 
> value_boolean, environment, account, channel, device, feat"...,
> params=0x0, isTopLevel=isTopLevel@entry=1 '\001', 
> dest=dest@entry=0x2b2a7c8, completionTag=completionTag@entry=0x7ffcb5e318e0 
> "") at utility.c:1076
> #7  0x007ea95e in PortalRunUtility (portal=portal@entry=0x2b8eab0, 
> utilityStmt=utilityStmt@entry=0x2b2a3d8, isTopLevel=isTopLevel@entry=1 
> '\001', dest=dest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:1969
> #8  0x007ec13e in PortalRunMulti (portal=portal@entry=0x2b8eab0, 
> isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x2b2a7c8, 
> altdest=altdest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:2079
> #9  0x007ede95 in PortalRun (portal=portal@entry=0x2b8eab0, 
> count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
> dest=dest@entry=0x2b2a7c8, altdest=altdest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:1596
> #10 0x007e5ad9 in exec_simple_query 
> (query_string=query_string@entry=0x2b29100 "COPY 
> mis_data_ig_client_derived_attributes.client_derived_attributes_src (id, 
> tracking_id, name, value_string, value_timestamp, value_number, 
> value_boolean, environment, account, channel, device, feat"...,
> seqServerHost=seqServerHost@entry=0x0, 
> seqServerPort=seqServerPort@entry=-1) at postgres.c:1816
> #11 0x007e6cb2 in PostgresMain (argc=, argv= out>, argv@entry=0x29d7820, username=0x29d75d0 "mis_ig") at postgres.c:4840
> #12 0x00799540 in BackendRun (port=0x29afc50) at postmaster.c:5915
> #13 BackendStartup (port=0x29afc50) at postmaster.c:5484
> #14 ServerLoop () at postmaster.c:2163
> #15 0x0079c309 in PostmasterMain (argc=, 
> argv=) at postmaster.c:1454
> #16 0x004a4209 in main (argc=9, argv=0x29af010) at main.c:226
> {code}
> Jumping into the frame 3 and running info locals, we found something odd for 
> "status" variable:
> {code}
> (gdb) f 3
> #3  0x0053b9c3 in SetSegnoForWrite (existing_segnos=0x4c46ff0, 
> existing_segnos@entry=0x0, relid=relid@entry=1195061, 
> segment_num=segment_num@entry=6, forNewRel=forNewRel@entry=0 '\000', 
> keepHash=keepHash@entry=1 '\001') at appendonlywriter.c:1166
> 1166  appendonlywriter.c: No such file or directory.
> (gdb) info locals
> status = 0x0
> [...]
> {code}
> This panic comes from this piece of code in "appendonlywritter.c":
> {code}

[jira] [Updated] (HAWQ-1378) Elaborate the "invalid command-line arguments for server process" error.

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1378:
--
Component/s: Query Execution

> Elaborate the "invalid command-line arguments for server process" error.
> 
>
> Key: HAWQ-1378
> URL: https://issues.apache.org/jira/browse/HAWQ-1378
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Affects Versions: 2.1.0.0-incubating
>Reporter: Paul Guo
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>
> I saw the following errors when running several times,
> "Error dispatching to ***: connection pointer is NULL."
> FATAL:  invalid command-line arguments for server process
> While this usually means there is bug in related code but the code should 
> have reported more detailed log so that we could catch what argument is wrong 
> with less pain, even there is log level switch for argument dumping.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HAWQ-1379) Do not send options multiple times in build_startup_packet()

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo reopened HAWQ-1379:
---

Reopen to mark the right affected version.

> Do not send options multiple times in build_startup_packet()
> 
>
> Key: HAWQ-1379
> URL: https://issues.apache.org/jira/browse/HAWQ-1379
> Project: Apache HAWQ
>  Issue Type: Bug
>Reporter: Paul Guo
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>
> build_startup_packet() build a libpq packet, however it includes 
> conn->pgoptions more than 1 time - this is is unnecessary and really wastes 
> network bandwidth.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HAWQ-1378) Elaborate the "invalid command-line arguments for server process" error.

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo resolved HAWQ-1378.
---
Resolution: Fixed

> Elaborate the "invalid command-line arguments for server process" error.
> 
>
> Key: HAWQ-1378
> URL: https://issues.apache.org/jira/browse/HAWQ-1378
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Affects Versions: 2.1.0.0-incubating
>Reporter: Paul Guo
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>
> I saw the following errors when running several times,
> "Error dispatching to ***: connection pointer is NULL."
> FATAL:  invalid command-line arguments for server process
> While this usually means there is bug in related code but the code should 
> have reported more detailed log so that we could catch what argument is wrong 
> with less pain, even there is log level switch for argument dumping.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (HAWQ-1378) Elaborate the "invalid command-line arguments for server process" error.

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo closed HAWQ-1378.
-

> Elaborate the "invalid command-line arguments for server process" error.
> 
>
> Key: HAWQ-1378
> URL: https://issues.apache.org/jira/browse/HAWQ-1378
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Affects Versions: 2.1.0.0-incubating
>Reporter: Paul Guo
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>
> I saw the following errors when running several times,
> "Error dispatching to ***: connection pointer is NULL."
> FATAL:  invalid command-line arguments for server process
> While this usually means there is bug in related code but the code should 
> have reported more detailed log so that we could catch what argument is wrong 
> with less pain, even there is log level switch for argument dumping.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HAWQ-1378) Elaborate the "invalid command-line arguments for server process" error.

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo reopened HAWQ-1378:
---

Reopen to mark the right affected version.

> Elaborate the "invalid command-line arguments for server process" error.
> 
>
> Key: HAWQ-1378
> URL: https://issues.apache.org/jira/browse/HAWQ-1378
> Project: Apache HAWQ
>  Issue Type: Bug
>Affects Versions: 2.1.0.0-incubating
>Reporter: Paul Guo
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>
> I saw the following errors when running several times,
> "Error dispatching to ***: connection pointer is NULL."
> FATAL:  invalid command-line arguments for server process
> While this usually means there is bug in related code but the code should 
> have reported more detailed log so that we could catch what argument is wrong 
> with less pain, even there is log level switch for argument dumping.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HAWQ-1378) Elaborate the "invalid command-line arguments for server process" error.

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1378:
--
Affects Version/s: 2.1.0.0-incubating

> Elaborate the "invalid command-line arguments for server process" error.
> 
>
> Key: HAWQ-1378
> URL: https://issues.apache.org/jira/browse/HAWQ-1378
> Project: Apache HAWQ
>  Issue Type: Bug
>Affects Versions: 2.1.0.0-incubating
>Reporter: Paul Guo
>Assignee: Paul Guo
> Fix For: 2.2.0.0-incubating
>
>
> I saw the following errors when running several times,
> "Error dispatching to ***: connection pointer is NULL."
> FATAL:  invalid command-line arguments for server process
> While this usually means there is bug in related code but the code should 
> have reported more detailed log so that we could catch what argument is wrong 
> with less pain, even there is log level switch for argument dumping.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (HAWQ-1371) QE process hang in shared input scan

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo closed HAWQ-1371.
-

> QE process hang in shared input scan
> 
>
> Key: HAWQ-1371
> URL: https://issues.apache.org/jira/browse/HAWQ-1371
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Affects Versions: 2.1.0.0-incubating
>Reporter: Amy
>Assignee: Amy
> Fix For: 2.2.0.0-incubating
>
>
> process hang on some segment node while QD and QE on other segment nodes 
> terminated.
> {code}
> on segment test2:
> [gpadmin@test2 ~]$ pp
> gpadmin   21614  0.0  1.2 788636 407428 ?   Ss   Feb26   1:19 
> /usr/local/hawq_2_1_0_0/bin/postgres -D 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-YARN/product/segmentdd -p 
> 31100 --silent-mode=true -M segment -i
> gpadmin   21615  0.0  0.0 279896  6952 ?Ss   Feb26   0:08 postgres: 
> port 31100, logger process
> gpadmin   21618  0.0  0.0 282128  6980 ?Ss   Feb26   0:00 postgres: 
> port 31100, stats collector process
> gpadmin   21619  0.0  0.0 788636  7280 ?Ss   Feb26   0:11 postgres: 
> port 31100, writer process
> gpadmin   21620  0.0  0.0 788636  7064 ?Ss   Feb26   0:01 postgres: 
> port 31100, checkpoint process
> gpadmin   21621  0.0  0.0 793048 11752 ?SFeb26   0:19 postgres: 
> port 31100, segment resource manager
> gpadmin   91760  0.0  0.0 861000 16840 ?TNsl Feb26   0:07 postgres: 
> port 31100, gpadmin parquetola... 10.32.35.141(15250) con558 seg4 cmd2 
> slice11 MPPEXEC SELECT
> gpadmin   91762  0.0  0.0 861064 17116 ?SNsl Feb26   0:08 postgres: 
> port 31100, gpadmin parquetola... 10.32.35.141(15253) con558 seg5 cmd2 
> slice11 MPPEXEC SELECT
> gpadmin  216648  0.0  0.0 103244   788 pts/0S+   19:54   0:00 grep 
> postgres
> {code}
> QE stack trace is:
> {code}
> (gdb) bt
> #0  0x0032214e1523 in select () from /lib64/libc.so.6
> #1  0x0069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, 
> share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989
> #2  0x00695798 in ExecEndMaterial (node=0x1d2eb50) at 
> nodeMaterial.c:512
> #3  0x0067048d in ExecEndNode (node=0x1d2eb50) at execProcnode.c:1681
> #4  0x0069c6b5 in ExecEndShareInputScan (node=0x1d2e6f0) at 
> nodeShareInputScan.c:382
> #5  0x0067042a in ExecEndNode (node=0x1d2e6f0) at execProcnode.c:1674
> #6  0x006ac9be in ExecEndSequence (node=0x1d23890) at 
> nodeSequence.c:165
> #7  0x006705f0 in ExecEndNode (node=0x1d23890) at execProcnode.c:1583
> #8  0x0069a0ab in ExecEndResult (node=0x1d214a0) at nodeResult.c:481
> #9  0x0067060d in ExecEndNode (node=0x1d214a0) at execProcnode.c:1575
> #10 0x0069a0ab in ExecEndResult (node=0x1d20860) at nodeResult.c:481
> #11 0x0067060d in ExecEndNode (node=0x1d20860) at execProcnode.c:1575
> #12 0x00698fd2 in ExecEndMotion (node=0x1d20320) at nodeMotion.c:1230
> #13 0x00670434 in ExecEndNode (node=0x1d20320) at execProcnode.c:1713
> #14 0x00669da7 in ExecEndPlan (planstate=0x1d20320, estate=0x1cb6b40) 
> at execMain.c:2896
> #15 0x0066a311 in ExecutorEnd (queryDesc=0x1cabf20) at execMain.c:1407
> #16 0x006195f2 in PortalCleanupHelper (portal=0x1cbcc40) at 
> portalcmds.c:365
> #17 PortalCleanup (portal=0x1cbcc40) at portalcmds.c:317
> #18 0x00900544 in AtAbort_Portals () at portalmem.c:693
> #19 0x004e697f in AbortTransaction () at xact.c:2800
> #20 0x004e7565 in AbortCurrentTransaction () at xact.c:3377
> #21 0x007ed0fa in PostgresMain (argc=, 
> argv=, username=0x1b47f10 "gpadmin") at postgres.c:4630
> #22 0x007a05d0 in BackendRun () at postmaster.c:5915
> #23 BackendStartup () at postmaster.c:5484
> #24 ServerLoop () at postmaster.c:2163
> #25 0x007a3399 in PostmasterMain (argc=Unhandled dwarf expression 
> opcode 0xf3
> ) at postmaster.c:1454
> #26 0x004a52e9 in main (argc=9, argv=0x1b0cd10) at main.c:226
> (gdb) p CurrentTransactionState->state
> $1 = TRANS_ABORT
> (gdb) p pctxt->donefd
> No symbol "pctxt" in current context.
> (gdb) f 1
> #1  0x0069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, 
> share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989
> 989   nodeShareInputScan.c: No such file or directory.
>   in nodeShareInputScan.c
> (gdb) p pctxt->donefd
> $2 = 15
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HAWQ-1371) QE process hang in shared input scan

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo resolved HAWQ-1371.
---
Resolution: Fixed

> QE process hang in shared input scan
> 
>
> Key: HAWQ-1371
> URL: https://issues.apache.org/jira/browse/HAWQ-1371
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Affects Versions: 2.1.0.0-incubating
>Reporter: Amy
>Assignee: Amy
> Fix For: 2.2.0.0-incubating
>
>
> process hang on some segment node while QD and QE on other segment nodes 
> terminated.
> {code}
> on segment test2:
> [gpadmin@test2 ~]$ pp
> gpadmin   21614  0.0  1.2 788636 407428 ?   Ss   Feb26   1:19 
> /usr/local/hawq_2_1_0_0/bin/postgres -D 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-YARN/product/segmentdd -p 
> 31100 --silent-mode=true -M segment -i
> gpadmin   21615  0.0  0.0 279896  6952 ?Ss   Feb26   0:08 postgres: 
> port 31100, logger process
> gpadmin   21618  0.0  0.0 282128  6980 ?Ss   Feb26   0:00 postgres: 
> port 31100, stats collector process
> gpadmin   21619  0.0  0.0 788636  7280 ?Ss   Feb26   0:11 postgres: 
> port 31100, writer process
> gpadmin   21620  0.0  0.0 788636  7064 ?Ss   Feb26   0:01 postgres: 
> port 31100, checkpoint process
> gpadmin   21621  0.0  0.0 793048 11752 ?SFeb26   0:19 postgres: 
> port 31100, segment resource manager
> gpadmin   91760  0.0  0.0 861000 16840 ?TNsl Feb26   0:07 postgres: 
> port 31100, gpadmin parquetola... 10.32.35.141(15250) con558 seg4 cmd2 
> slice11 MPPEXEC SELECT
> gpadmin   91762  0.0  0.0 861064 17116 ?SNsl Feb26   0:08 postgres: 
> port 31100, gpadmin parquetola... 10.32.35.141(15253) con558 seg5 cmd2 
> slice11 MPPEXEC SELECT
> gpadmin  216648  0.0  0.0 103244   788 pts/0S+   19:54   0:00 grep 
> postgres
> {code}
> QE stack trace is:
> {code}
> (gdb) bt
> #0  0x0032214e1523 in select () from /lib64/libc.so.6
> #1  0x0069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, 
> share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989
> #2  0x00695798 in ExecEndMaterial (node=0x1d2eb50) at 
> nodeMaterial.c:512
> #3  0x0067048d in ExecEndNode (node=0x1d2eb50) at execProcnode.c:1681
> #4  0x0069c6b5 in ExecEndShareInputScan (node=0x1d2e6f0) at 
> nodeShareInputScan.c:382
> #5  0x0067042a in ExecEndNode (node=0x1d2e6f0) at execProcnode.c:1674
> #6  0x006ac9be in ExecEndSequence (node=0x1d23890) at 
> nodeSequence.c:165
> #7  0x006705f0 in ExecEndNode (node=0x1d23890) at execProcnode.c:1583
> #8  0x0069a0ab in ExecEndResult (node=0x1d214a0) at nodeResult.c:481
> #9  0x0067060d in ExecEndNode (node=0x1d214a0) at execProcnode.c:1575
> #10 0x0069a0ab in ExecEndResult (node=0x1d20860) at nodeResult.c:481
> #11 0x0067060d in ExecEndNode (node=0x1d20860) at execProcnode.c:1575
> #12 0x00698fd2 in ExecEndMotion (node=0x1d20320) at nodeMotion.c:1230
> #13 0x00670434 in ExecEndNode (node=0x1d20320) at execProcnode.c:1713
> #14 0x00669da7 in ExecEndPlan (planstate=0x1d20320, estate=0x1cb6b40) 
> at execMain.c:2896
> #15 0x0066a311 in ExecutorEnd (queryDesc=0x1cabf20) at execMain.c:1407
> #16 0x006195f2 in PortalCleanupHelper (portal=0x1cbcc40) at 
> portalcmds.c:365
> #17 PortalCleanup (portal=0x1cbcc40) at portalcmds.c:317
> #18 0x00900544 in AtAbort_Portals () at portalmem.c:693
> #19 0x004e697f in AbortTransaction () at xact.c:2800
> #20 0x004e7565 in AbortCurrentTransaction () at xact.c:3377
> #21 0x007ed0fa in PostgresMain (argc=, 
> argv=, username=0x1b47f10 "gpadmin") at postgres.c:4630
> #22 0x007a05d0 in BackendRun () at postmaster.c:5915
> #23 BackendStartup () at postmaster.c:5484
> #24 ServerLoop () at postmaster.c:2163
> #25 0x007a3399 in PostmasterMain (argc=Unhandled dwarf expression 
> opcode 0xf3
> ) at postmaster.c:1454
> #26 0x004a52e9 in main (argc=9, argv=0x1b0cd10) at main.c:226
> (gdb) p CurrentTransactionState->state
> $1 = TRANS_ABORT
> (gdb) p pctxt->donefd
> No symbol "pctxt" in current context.
> (gdb) f 1
> #1  0x0069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, 
> share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989
> 989   nodeShareInputScan.c: No such file or directory.
>   in nodeShareInputScan.c
> (gdb) p pctxt->donefd
> $2 = 15
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HAWQ-1371) QE process hang in shared input scan

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1371:
--
Fix Version/s: (was: backlog)
   2.2.0.0-incubating

> QE process hang in shared input scan
> 
>
> Key: HAWQ-1371
> URL: https://issues.apache.org/jira/browse/HAWQ-1371
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Affects Versions: 2.1.0.0-incubating
>Reporter: Amy
>Assignee: Amy
> Fix For: 2.2.0.0-incubating
>
>
> process hang on some segment node while QD and QE on other segment nodes 
> terminated.
> {code}
> on segment test2:
> [gpadmin@test2 ~]$ pp
> gpadmin   21614  0.0  1.2 788636 407428 ?   Ss   Feb26   1:19 
> /usr/local/hawq_2_1_0_0/bin/postgres -D 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-YARN/product/segmentdd -p 
> 31100 --silent-mode=true -M segment -i
> gpadmin   21615  0.0  0.0 279896  6952 ?Ss   Feb26   0:08 postgres: 
> port 31100, logger process
> gpadmin   21618  0.0  0.0 282128  6980 ?Ss   Feb26   0:00 postgres: 
> port 31100, stats collector process
> gpadmin   21619  0.0  0.0 788636  7280 ?Ss   Feb26   0:11 postgres: 
> port 31100, writer process
> gpadmin   21620  0.0  0.0 788636  7064 ?Ss   Feb26   0:01 postgres: 
> port 31100, checkpoint process
> gpadmin   21621  0.0  0.0 793048 11752 ?SFeb26   0:19 postgres: 
> port 31100, segment resource manager
> gpadmin   91760  0.0  0.0 861000 16840 ?TNsl Feb26   0:07 postgres: 
> port 31100, gpadmin parquetola... 10.32.35.141(15250) con558 seg4 cmd2 
> slice11 MPPEXEC SELECT
> gpadmin   91762  0.0  0.0 861064 17116 ?SNsl Feb26   0:08 postgres: 
> port 31100, gpadmin parquetola... 10.32.35.141(15253) con558 seg5 cmd2 
> slice11 MPPEXEC SELECT
> gpadmin  216648  0.0  0.0 103244   788 pts/0S+   19:54   0:00 grep 
> postgres
> {code}
> QE stack trace is:
> {code}
> (gdb) bt
> #0  0x0032214e1523 in select () from /lib64/libc.so.6
> #1  0x0069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, 
> share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989
> #2  0x00695798 in ExecEndMaterial (node=0x1d2eb50) at 
> nodeMaterial.c:512
> #3  0x0067048d in ExecEndNode (node=0x1d2eb50) at execProcnode.c:1681
> #4  0x0069c6b5 in ExecEndShareInputScan (node=0x1d2e6f0) at 
> nodeShareInputScan.c:382
> #5  0x0067042a in ExecEndNode (node=0x1d2e6f0) at execProcnode.c:1674
> #6  0x006ac9be in ExecEndSequence (node=0x1d23890) at 
> nodeSequence.c:165
> #7  0x006705f0 in ExecEndNode (node=0x1d23890) at execProcnode.c:1583
> #8  0x0069a0ab in ExecEndResult (node=0x1d214a0) at nodeResult.c:481
> #9  0x0067060d in ExecEndNode (node=0x1d214a0) at execProcnode.c:1575
> #10 0x0069a0ab in ExecEndResult (node=0x1d20860) at nodeResult.c:481
> #11 0x0067060d in ExecEndNode (node=0x1d20860) at execProcnode.c:1575
> #12 0x00698fd2 in ExecEndMotion (node=0x1d20320) at nodeMotion.c:1230
> #13 0x00670434 in ExecEndNode (node=0x1d20320) at execProcnode.c:1713
> #14 0x00669da7 in ExecEndPlan (planstate=0x1d20320, estate=0x1cb6b40) 
> at execMain.c:2896
> #15 0x0066a311 in ExecutorEnd (queryDesc=0x1cabf20) at execMain.c:1407
> #16 0x006195f2 in PortalCleanupHelper (portal=0x1cbcc40) at 
> portalcmds.c:365
> #17 PortalCleanup (portal=0x1cbcc40) at portalcmds.c:317
> #18 0x00900544 in AtAbort_Portals () at portalmem.c:693
> #19 0x004e697f in AbortTransaction () at xact.c:2800
> #20 0x004e7565 in AbortCurrentTransaction () at xact.c:3377
> #21 0x007ed0fa in PostgresMain (argc=, 
> argv=, username=0x1b47f10 "gpadmin") at postgres.c:4630
> #22 0x007a05d0 in BackendRun () at postmaster.c:5915
> #23 BackendStartup () at postmaster.c:5484
> #24 ServerLoop () at postmaster.c:2163
> #25 0x007a3399 in PostmasterMain (argc=Unhandled dwarf expression 
> opcode 0xf3
> ) at postmaster.c:1454
> #26 0x004a52e9 in main (argc=9, argv=0x1b0cd10) at main.c:226
> (gdb) p CurrentTransactionState->state
> $1 = TRANS_ABORT
> (gdb) p pctxt->donefd
> No symbol "pctxt" in current context.
> (gdb) f 1
> #1  0x0069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, 
> share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989
> 989   nodeShareInputScan.c: No such file or directory.
>   in nodeShareInputScan.c
> (gdb) p pctxt->donefd
> $2 = 15
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HAWQ-1371) QE process hang in shared input scan

2017-04-04 Thread Ruilong Huo (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956180#comment-15956180
 ] 

Ruilong Huo commented on HAWQ-1371:
---

Reopen to correct the fix version.

> QE process hang in shared input scan
> 
>
> Key: HAWQ-1371
> URL: https://issues.apache.org/jira/browse/HAWQ-1371
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Affects Versions: 2.1.0.0-incubating
>Reporter: Amy
>Assignee: Amy
> Fix For: 2.2.0.0-incubating
>
>
> process hang on some segment node while QD and QE on other segment nodes 
> terminated.
> {code}
> on segment test2:
> [gpadmin@test2 ~]$ pp
> gpadmin   21614  0.0  1.2 788636 407428 ?   Ss   Feb26   1:19 
> /usr/local/hawq_2_1_0_0/bin/postgres -D 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-YARN/product/segmentdd -p 
> 31100 --silent-mode=true -M segment -i
> gpadmin   21615  0.0  0.0 279896  6952 ?Ss   Feb26   0:08 postgres: 
> port 31100, logger process
> gpadmin   21618  0.0  0.0 282128  6980 ?Ss   Feb26   0:00 postgres: 
> port 31100, stats collector process
> gpadmin   21619  0.0  0.0 788636  7280 ?Ss   Feb26   0:11 postgres: 
> port 31100, writer process
> gpadmin   21620  0.0  0.0 788636  7064 ?Ss   Feb26   0:01 postgres: 
> port 31100, checkpoint process
> gpadmin   21621  0.0  0.0 793048 11752 ?SFeb26   0:19 postgres: 
> port 31100, segment resource manager
> gpadmin   91760  0.0  0.0 861000 16840 ?TNsl Feb26   0:07 postgres: 
> port 31100, gpadmin parquetola... 10.32.35.141(15250) con558 seg4 cmd2 
> slice11 MPPEXEC SELECT
> gpadmin   91762  0.0  0.0 861064 17116 ?SNsl Feb26   0:08 postgres: 
> port 31100, gpadmin parquetola... 10.32.35.141(15253) con558 seg5 cmd2 
> slice11 MPPEXEC SELECT
> gpadmin  216648  0.0  0.0 103244   788 pts/0S+   19:54   0:00 grep 
> postgres
> {code}
> QE stack trace is:
> {code}
> (gdb) bt
> #0  0x0032214e1523 in select () from /lib64/libc.so.6
> #1  0x0069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, 
> share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989
> #2  0x00695798 in ExecEndMaterial (node=0x1d2eb50) at 
> nodeMaterial.c:512
> #3  0x0067048d in ExecEndNode (node=0x1d2eb50) at execProcnode.c:1681
> #4  0x0069c6b5 in ExecEndShareInputScan (node=0x1d2e6f0) at 
> nodeShareInputScan.c:382
> #5  0x0067042a in ExecEndNode (node=0x1d2e6f0) at execProcnode.c:1674
> #6  0x006ac9be in ExecEndSequence (node=0x1d23890) at 
> nodeSequence.c:165
> #7  0x006705f0 in ExecEndNode (node=0x1d23890) at execProcnode.c:1583
> #8  0x0069a0ab in ExecEndResult (node=0x1d214a0) at nodeResult.c:481
> #9  0x0067060d in ExecEndNode (node=0x1d214a0) at execProcnode.c:1575
> #10 0x0069a0ab in ExecEndResult (node=0x1d20860) at nodeResult.c:481
> #11 0x0067060d in ExecEndNode (node=0x1d20860) at execProcnode.c:1575
> #12 0x00698fd2 in ExecEndMotion (node=0x1d20320) at nodeMotion.c:1230
> #13 0x00670434 in ExecEndNode (node=0x1d20320) at execProcnode.c:1713
> #14 0x00669da7 in ExecEndPlan (planstate=0x1d20320, estate=0x1cb6b40) 
> at execMain.c:2896
> #15 0x0066a311 in ExecutorEnd (queryDesc=0x1cabf20) at execMain.c:1407
> #16 0x006195f2 in PortalCleanupHelper (portal=0x1cbcc40) at 
> portalcmds.c:365
> #17 PortalCleanup (portal=0x1cbcc40) at portalcmds.c:317
> #18 0x00900544 in AtAbort_Portals () at portalmem.c:693
> #19 0x004e697f in AbortTransaction () at xact.c:2800
> #20 0x004e7565 in AbortCurrentTransaction () at xact.c:3377
> #21 0x007ed0fa in PostgresMain (argc=, 
> argv=, username=0x1b47f10 "gpadmin") at postgres.c:4630
> #22 0x007a05d0 in BackendRun () at postmaster.c:5915
> #23 BackendStartup () at postmaster.c:5484
> #24 ServerLoop () at postmaster.c:2163
> #25 0x007a3399 in PostmasterMain (argc=Unhandled dwarf expression 
> opcode 0xf3
> ) at postmaster.c:1454
> #26 0x004a52e9 in main (argc=9, argv=0x1b0cd10) at main.c:226
> (gdb) p CurrentTransactionState->state
> $1 = TRANS_ABORT
> (gdb) p pctxt->donefd
> No symbol "pctxt" in current context.
> (gdb) f 1
> #1  0x0069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, 
> share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989
> 989   nodeShareInputScan.c: No such file or directory.
>   in nodeShareInputScan.c
> (gdb) p pctxt->donefd
> $2 = 15
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HAWQ-1371) QE process hang in shared input scan

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo reopened HAWQ-1371:
---

> QE process hang in shared input scan
> 
>
> Key: HAWQ-1371
> URL: https://issues.apache.org/jira/browse/HAWQ-1371
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Affects Versions: 2.1.0.0-incubating
>Reporter: Amy
>Assignee: Amy
> Fix For: 2.2.0.0-incubating
>
>
> process hang on some segment node while QD and QE on other segment nodes 
> terminated.
> {code}
> on segment test2:
> [gpadmin@test2 ~]$ pp
> gpadmin   21614  0.0  1.2 788636 407428 ?   Ss   Feb26   1:19 
> /usr/local/hawq_2_1_0_0/bin/postgres -D 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-YARN/product/segmentdd -p 
> 31100 --silent-mode=true -M segment -i
> gpadmin   21615  0.0  0.0 279896  6952 ?Ss   Feb26   0:08 postgres: 
> port 31100, logger process
> gpadmin   21618  0.0  0.0 282128  6980 ?Ss   Feb26   0:00 postgres: 
> port 31100, stats collector process
> gpadmin   21619  0.0  0.0 788636  7280 ?Ss   Feb26   0:11 postgres: 
> port 31100, writer process
> gpadmin   21620  0.0  0.0 788636  7064 ?Ss   Feb26   0:01 postgres: 
> port 31100, checkpoint process
> gpadmin   21621  0.0  0.0 793048 11752 ?SFeb26   0:19 postgres: 
> port 31100, segment resource manager
> gpadmin   91760  0.0  0.0 861000 16840 ?TNsl Feb26   0:07 postgres: 
> port 31100, gpadmin parquetola... 10.32.35.141(15250) con558 seg4 cmd2 
> slice11 MPPEXEC SELECT
> gpadmin   91762  0.0  0.0 861064 17116 ?SNsl Feb26   0:08 postgres: 
> port 31100, gpadmin parquetola... 10.32.35.141(15253) con558 seg5 cmd2 
> slice11 MPPEXEC SELECT
> gpadmin  216648  0.0  0.0 103244   788 pts/0S+   19:54   0:00 grep 
> postgres
> {code}
> QE stack trace is:
> {code}
> (gdb) bt
> #0  0x0032214e1523 in select () from /lib64/libc.so.6
> #1  0x0069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, 
> share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989
> #2  0x00695798 in ExecEndMaterial (node=0x1d2eb50) at 
> nodeMaterial.c:512
> #3  0x0067048d in ExecEndNode (node=0x1d2eb50) at execProcnode.c:1681
> #4  0x0069c6b5 in ExecEndShareInputScan (node=0x1d2e6f0) at 
> nodeShareInputScan.c:382
> #5  0x0067042a in ExecEndNode (node=0x1d2e6f0) at execProcnode.c:1674
> #6  0x006ac9be in ExecEndSequence (node=0x1d23890) at 
> nodeSequence.c:165
> #7  0x006705f0 in ExecEndNode (node=0x1d23890) at execProcnode.c:1583
> #8  0x0069a0ab in ExecEndResult (node=0x1d214a0) at nodeResult.c:481
> #9  0x0067060d in ExecEndNode (node=0x1d214a0) at execProcnode.c:1575
> #10 0x0069a0ab in ExecEndResult (node=0x1d20860) at nodeResult.c:481
> #11 0x0067060d in ExecEndNode (node=0x1d20860) at execProcnode.c:1575
> #12 0x00698fd2 in ExecEndMotion (node=0x1d20320) at nodeMotion.c:1230
> #13 0x00670434 in ExecEndNode (node=0x1d20320) at execProcnode.c:1713
> #14 0x00669da7 in ExecEndPlan (planstate=0x1d20320, estate=0x1cb6b40) 
> at execMain.c:2896
> #15 0x0066a311 in ExecutorEnd (queryDesc=0x1cabf20) at execMain.c:1407
> #16 0x006195f2 in PortalCleanupHelper (portal=0x1cbcc40) at 
> portalcmds.c:365
> #17 PortalCleanup (portal=0x1cbcc40) at portalcmds.c:317
> #18 0x00900544 in AtAbort_Portals () at portalmem.c:693
> #19 0x004e697f in AbortTransaction () at xact.c:2800
> #20 0x004e7565 in AbortCurrentTransaction () at xact.c:3377
> #21 0x007ed0fa in PostgresMain (argc=, 
> argv=, username=0x1b47f10 "gpadmin") at postgres.c:4630
> #22 0x007a05d0 in BackendRun () at postmaster.c:5915
> #23 BackendStartup () at postmaster.c:5484
> #24 ServerLoop () at postmaster.c:2163
> #25 0x007a3399 in PostmasterMain (argc=Unhandled dwarf expression 
> opcode 0xf3
> ) at postmaster.c:1454
> #26 0x004a52e9 in main (argc=9, argv=0x1b0cd10) at main.c:226
> (gdb) p CurrentTransactionState->state
> $1 = TRANS_ABORT
> (gdb) p pctxt->donefd
> No symbol "pctxt" in current context.
> (gdb) f 1
> #1  0x0069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, 
> share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989
> 989   nodeShareInputScan.c: No such file or directory.
>   in nodeShareInputScan.c
> (gdb) p pctxt->donefd
> $2 = 15
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HAWQ-1371) QE process hang in shared input scan

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1371:
--
Affects Version/s: 2.1.0.0-incubating

> QE process hang in shared input scan
> 
>
> Key: HAWQ-1371
> URL: https://issues.apache.org/jira/browse/HAWQ-1371
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Affects Versions: 2.1.0.0-incubating
>Reporter: Amy
>Assignee: Amy
> Fix For: 2.2.0.0-incubating
>
>
> process hang on some segment node while QD and QE on other segment nodes 
> terminated.
> {code}
> on segment test2:
> [gpadmin@test2 ~]$ pp
> gpadmin   21614  0.0  1.2 788636 407428 ?   Ss   Feb26   1:19 
> /usr/local/hawq_2_1_0_0/bin/postgres -D 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-YARN/product/segmentdd -p 
> 31100 --silent-mode=true -M segment -i
> gpadmin   21615  0.0  0.0 279896  6952 ?Ss   Feb26   0:08 postgres: 
> port 31100, logger process
> gpadmin   21618  0.0  0.0 282128  6980 ?Ss   Feb26   0:00 postgres: 
> port 31100, stats collector process
> gpadmin   21619  0.0  0.0 788636  7280 ?Ss   Feb26   0:11 postgres: 
> port 31100, writer process
> gpadmin   21620  0.0  0.0 788636  7064 ?Ss   Feb26   0:01 postgres: 
> port 31100, checkpoint process
> gpadmin   21621  0.0  0.0 793048 11752 ?SFeb26   0:19 postgres: 
> port 31100, segment resource manager
> gpadmin   91760  0.0  0.0 861000 16840 ?TNsl Feb26   0:07 postgres: 
> port 31100, gpadmin parquetola... 10.32.35.141(15250) con558 seg4 cmd2 
> slice11 MPPEXEC SELECT
> gpadmin   91762  0.0  0.0 861064 17116 ?SNsl Feb26   0:08 postgres: 
> port 31100, gpadmin parquetola... 10.32.35.141(15253) con558 seg5 cmd2 
> slice11 MPPEXEC SELECT
> gpadmin  216648  0.0  0.0 103244   788 pts/0S+   19:54   0:00 grep 
> postgres
> {code}
> QE stack trace is:
> {code}
> (gdb) bt
> #0  0x0032214e1523 in select () from /lib64/libc.so.6
> #1  0x0069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, 
> share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989
> #2  0x00695798 in ExecEndMaterial (node=0x1d2eb50) at 
> nodeMaterial.c:512
> #3  0x0067048d in ExecEndNode (node=0x1d2eb50) at execProcnode.c:1681
> #4  0x0069c6b5 in ExecEndShareInputScan (node=0x1d2e6f0) at 
> nodeShareInputScan.c:382
> #5  0x0067042a in ExecEndNode (node=0x1d2e6f0) at execProcnode.c:1674
> #6  0x006ac9be in ExecEndSequence (node=0x1d23890) at 
> nodeSequence.c:165
> #7  0x006705f0 in ExecEndNode (node=0x1d23890) at execProcnode.c:1583
> #8  0x0069a0ab in ExecEndResult (node=0x1d214a0) at nodeResult.c:481
> #9  0x0067060d in ExecEndNode (node=0x1d214a0) at execProcnode.c:1575
> #10 0x0069a0ab in ExecEndResult (node=0x1d20860) at nodeResult.c:481
> #11 0x0067060d in ExecEndNode (node=0x1d20860) at execProcnode.c:1575
> #12 0x00698fd2 in ExecEndMotion (node=0x1d20320) at nodeMotion.c:1230
> #13 0x00670434 in ExecEndNode (node=0x1d20320) at execProcnode.c:1713
> #14 0x00669da7 in ExecEndPlan (planstate=0x1d20320, estate=0x1cb6b40) 
> at execMain.c:2896
> #15 0x0066a311 in ExecutorEnd (queryDesc=0x1cabf20) at execMain.c:1407
> #16 0x006195f2 in PortalCleanupHelper (portal=0x1cbcc40) at 
> portalcmds.c:365
> #17 PortalCleanup (portal=0x1cbcc40) at portalcmds.c:317
> #18 0x00900544 in AtAbort_Portals () at portalmem.c:693
> #19 0x004e697f in AbortTransaction () at xact.c:2800
> #20 0x004e7565 in AbortCurrentTransaction () at xact.c:3377
> #21 0x007ed0fa in PostgresMain (argc=, 
> argv=, username=0x1b47f10 "gpadmin") at postgres.c:4630
> #22 0x007a05d0 in BackendRun () at postmaster.c:5915
> #23 BackendStartup () at postmaster.c:5484
> #24 ServerLoop () at postmaster.c:2163
> #25 0x007a3399 in PostmasterMain (argc=Unhandled dwarf expression 
> opcode 0xf3
> ) at postmaster.c:1454
> #26 0x004a52e9 in main (argc=9, argv=0x1b0cd10) at main.c:226
> (gdb) p CurrentTransactionState->state
> $1 = TRANS_ABORT
> (gdb) p pctxt->donefd
> No symbol "pctxt" in current context.
> (gdb) f 1
> #1  0x0069c2fa in shareinput_writer_waitdone (ctxt=0x1dae520, 
> share_id=0, nsharer_xslice=7) at nodeShareInputScan.c:989
> 989   nodeShareInputScan.c: No such file or directory.
>   in nodeShareInputScan.c
> (gdb) p pctxt->donefd
> $2 = 15
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HAWQ-1404) PXF to leverage file-level stats of ORC file and emit records for COUNT(*)

2017-04-04 Thread Oleksandr Diachenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Diachenko updated HAWQ-1404:
--
Fix Version/s: 2.3.0.0-incubating

> PXF to leverage file-level stats of ORC file and emit records for COUNT(*)
> --
>
> Key: HAWQ-1404
> URL: https://issues.apache.org/jira/browse/HAWQ-1404
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: PXF
>Reporter: Oleksandr Diachenko
>Assignee: Oleksandr Diachenko
> Fix For: 2.3.0.0-incubating
>
>
> For cases when user issues COUNT(*) queries without WHERE clause PXF should 
> be able to leverage file-level stats for a ORC file and emit given number of 
> records back to HAWQ, avoiding reading actual tuples from disk. This should 
> be a first step in enabling PXF to use ORC stats(file, stripe and row group 
> levels) so we can improve a wider range of aggregate queries.
> So whenever PXF receives "count" as AGG-TYPE parameters value - it should 
> optimize it by emitting tuples using ORC file-level stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HAWQ-1404) PXF to leverage file-level stats of ORC file and emit records for COUNT(*)

2017-04-04 Thread Oleksandr Diachenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Diachenko resolved HAWQ-1404.
---
Resolution: Fixed

> PXF to leverage file-level stats of ORC file and emit records for COUNT(*)
> --
>
> Key: HAWQ-1404
> URL: https://issues.apache.org/jira/browse/HAWQ-1404
> Project: Apache HAWQ
>  Issue Type: Improvement
>  Components: PXF
>Reporter: Oleksandr Diachenko
>Assignee: Oleksandr Diachenko
> Fix For: 2.3.0.0-incubating
>
>
> For cases when user issues COUNT(*) queries without WHERE clause PXF should 
> be able to leverage file-level stats for a ORC file and emit given number of 
> records back to HAWQ, avoiding reading actual tuples from disk. This should 
> be a first step in enabling PXF to use ORC stats(file, stripe and row group 
> levels) so we can improve a wider range of aggregate queries.
> So whenever PXF receives "count" as AGG-TYPE parameters value - it should 
> optimize it by emitting tuples using ORC file-level stats.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] incubator-hawq pull request #1209: HAWQ-1404. PXF to leverage file-level sta...

2017-04-04 Thread sansanichfb
Github user sansanichfb closed the pull request at:

https://github.com/apache/incubator-hawq/pull/1209


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (HAWQ-1418) Print executing command for hawq register

2017-04-04 Thread Lili Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955459#comment-15955459
 ] 

Lili Ma commented on HAWQ-1418:
---

The aim for this JIRA is printing out the detailed command which the util is 
running, so that it will be easier to analyze the output logs of hawq register, 
especially during concurrent call of hawq register.

> Print executing command for hawq register
> -
>
> Key: HAWQ-1418
> URL: https://issues.apache.org/jira/browse/HAWQ-1418
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Command Line Tools
>Reporter: Chunling Wang
>Assignee: Chunling Wang
> Fix For: backlog
>
>
> Print executing command for hawq register



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HAWQ-1408) PANICs during COPY ... FROM STDIN

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1408:
--
Fix Version/s: (was: 2.1.0.0-incubating)
   2.2.0.0-incubating

> PANICs during COPY ... FROM STDIN
> -
>
> Key: HAWQ-1408
> URL: https://issues.apache.org/jira/browse/HAWQ-1408
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: backlog
>Reporter: Ming LI
>Assignee: Ming LI
> Fix For: 2.2.0.0-incubating
>
>
> We found PANIC (and respective core dumps). From the initial analysis from 
> the logs and core dump, the query causing this PANIC is a "COPY ... FROM 
> STDIN". This query does not always panic.
> This kind of queries are executed from Java/Scala code (by one of IG Spark 
> Jobs). Connection to the DB is managed by connection pool (commons-dbcp2) and 
> validated on borrow by “select 1” validation query. IG is using 
> postgresql-9.4-1206-jdbc41 as a java driver to create those connections. I 
> believe they should be using the driver from DataDirect, available in PivNet; 
> however, I haven't found hard evidence pointing the driver as a root cause.
> My initial analysis on the packcore for the master PANIC. Not sure if this 
> helps or makes sense.
> This is the backtrace of the packcore for process 466858:
> {code}
> (gdb) bt
> #0  0x7fd875f906ab in raise () from 
> /data/logs/52280/packcore-core.postgres.466858/lib64/libpthread.so.0
> #1  0x008c0b19 in SafeHandlerForSegvBusIll (postgres_signal_arg=11, 
> processName=) at elog.c:4519
> #2  
> #3  0x0053b9c3 in SetSegnoForWrite (existing_segnos=0x4c46ff0, 
> existing_segnos@entry=0x0, relid=relid@entry=1195061, 
> segment_num=segment_num@entry=6, forNewRel=forNewRel@entry=0 '\000', 
> keepHash=keepHash@entry=1 '\001') at appendonlywriter.c:1166
> #4  0x0053c08f in assignPerRelSegno 
> (all_relids=all_relids@entry=0x2b96d68, segment_num=6) at 
> appendonlywriter.c:1212
> #5  0x005f79e8 in DoCopy (stmt=stmt@entry=0x2b2a3d8, 
> queryString=) at copy.c:1591
> #6  0x007ef737 in ProcessUtility 
> (parsetree=parsetree@entry=0x2b2a3d8, queryString=0x2c2f550 "COPY 
> mis_data_ig_client_derived_attributes.client_derived_attributes_src (id, 
> tracking_id, name, value_string, value_timestamp, value_number, 
> value_boolean, environment, account, channel, device, feat"...,
> params=0x0, isTopLevel=isTopLevel@entry=1 '\001', 
> dest=dest@entry=0x2b2a7c8, completionTag=completionTag@entry=0x7ffcb5e318e0 
> "") at utility.c:1076
> #7  0x007ea95e in PortalRunUtility (portal=portal@entry=0x2b8eab0, 
> utilityStmt=utilityStmt@entry=0x2b2a3d8, isTopLevel=isTopLevel@entry=1 
> '\001', dest=dest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:1969
> #8  0x007ec13e in PortalRunMulti (portal=portal@entry=0x2b8eab0, 
> isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x2b2a7c8, 
> altdest=altdest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:2079
> #9  0x007ede95 in PortalRun (portal=portal@entry=0x2b8eab0, 
> count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
> dest=dest@entry=0x2b2a7c8, altdest=altdest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:1596
> #10 0x007e5ad9 in exec_simple_query 
> (query_string=query_string@entry=0x2b29100 "COPY 
> mis_data_ig_client_derived_attributes.client_derived_attributes_src (id, 
> tracking_id, name, value_string, value_timestamp, value_number, 
> value_boolean, environment, account, channel, device, feat"...,
> seqServerHost=seqServerHost@entry=0x0, 
> seqServerPort=seqServerPort@entry=-1) at postgres.c:1816
> #11 0x007e6cb2 in PostgresMain (argc=, argv= out>, argv@entry=0x29d7820, username=0x29d75d0 "mis_ig") at postgres.c:4840
> #12 0x00799540 in BackendRun (port=0x29afc50) at postmaster.c:5915
> #13 BackendStartup (port=0x29afc50) at postmaster.c:5484
> #14 ServerLoop () at postmaster.c:2163
> #15 0x0079c309 in PostmasterMain (argc=, 
> argv=) at postmaster.c:1454
> #16 0x004a4209 in main (argc=9, argv=0x29af010) at main.c:226
> {code}
> Jumping into the frame 3 and running info locals, we found something odd for 
> "status" variable:
> {code}
> (gdb) f 3
> #3  0x0053b9c3 in SetSegnoForWrite (existing_segnos=0x4c46ff0, 
> existing_segnos@entry=0x0, relid=relid@entry=1195061, 
> segment_num=segment_num@entry=6, forNewRel=forNewRel@entry=0 '\000', 
> keepHash=keepHash@entry=1 '\001') at appendonlywriter.c:1166
> 1166  appendonlywriter.c: No such file or directory.
> (gdb) info locals
> status = 0x0
> [...]
> {code}
> This panic comes from this piece of code in "appendonlywritter.c":
> {code}
> for (