[jira] [Updated] (HAWQ-1408) PANICs during COPY ... FROM STDIN

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1408:
--
Affects Version/s: (was: backlog)
   2.1.0.0-incubating

> PANICs during COPY ... FROM STDIN
> -
>
> Key: HAWQ-1408
> URL: https://issues.apache.org/jira/browse/HAWQ-1408
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 2.1.0.0-incubating
>Reporter: Ming LI
>Assignee: Ming LI
> Fix For: 2.2.0.0-incubating
>
>
> We found PANIC (and respective core dumps). From the initial analysis from 
> the logs and core dump, the query causing this PANIC is a "COPY ... FROM 
> STDIN". This query does not always panic.
> This kind of queries are executed from Java/Scala code (by one of IG Spark 
> Jobs). Connection to the DB is managed by connection pool (commons-dbcp2) and 
> validated on borrow by “select 1” validation query. IG is using 
> postgresql-9.4-1206-jdbc41 as a java driver to create those connections. I 
> believe they should be using the driver from DataDirect, available in PivNet; 
> however, I haven't found hard evidence pointing the driver as a root cause.
> My initial analysis on the packcore for the master PANIC. Not sure if this 
> helps or makes sense.
> This is the backtrace of the packcore for process 466858:
> {code}
> (gdb) bt
> #0  0x7fd875f906ab in raise () from 
> /data/logs/52280/packcore-core.postgres.466858/lib64/libpthread.so.0
> #1  0x008c0b19 in SafeHandlerForSegvBusIll (postgres_signal_arg=11, 
> processName=) at elog.c:4519
> #2  
> #3  0x0053b9c3 in SetSegnoForWrite (existing_segnos=0x4c46ff0, 
> existing_segnos@entry=0x0, relid=relid@entry=1195061, 
> segment_num=segment_num@entry=6, forNewRel=forNewRel@entry=0 '\000', 
> keepHash=keepHash@entry=1 '\001') at appendonlywriter.c:1166
> #4  0x0053c08f in assignPerRelSegno 
> (all_relids=all_relids@entry=0x2b96d68, segment_num=6) at 
> appendonlywriter.c:1212
> #5  0x005f79e8 in DoCopy (stmt=stmt@entry=0x2b2a3d8, 
> queryString=) at copy.c:1591
> #6  0x007ef737 in ProcessUtility 
> (parsetree=parsetree@entry=0x2b2a3d8, queryString=0x2c2f550 "COPY 
> mis_data_ig_client_derived_attributes.client_derived_attributes_src (id, 
> tracking_id, name, value_string, value_timestamp, value_number, 
> value_boolean, environment, account, channel, device, feat"...,
> params=0x0, isTopLevel=isTopLevel@entry=1 '\001', 
> dest=dest@entry=0x2b2a7c8, completionTag=completionTag@entry=0x7ffcb5e318e0 
> "") at utility.c:1076
> #7  0x007ea95e in PortalRunUtility (portal=portal@entry=0x2b8eab0, 
> utilityStmt=utilityStmt@entry=0x2b2a3d8, isTopLevel=isTopLevel@entry=1 
> '\001', dest=dest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:1969
> #8  0x007ec13e in PortalRunMulti (portal=portal@entry=0x2b8eab0, 
> isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x2b2a7c8, 
> altdest=altdest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:2079
> #9  0x007ede95 in PortalRun (portal=portal@entry=0x2b8eab0, 
> count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
> dest=dest@entry=0x2b2a7c8, altdest=altdest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:1596
> #10 0x007e5ad9 in exec_simple_query 
> (query_string=query_string@entry=0x2b29100 "COPY 
> mis_data_ig_client_derived_attributes.client_derived_attributes_src (id, 
> tracking_id, name, value_string, value_timestamp, value_number, 
> value_boolean, environment, account, channel, device, feat"...,
> seqServerHost=seqServerHost@entry=0x0, 
> seqServerPort=seqServerPort@entry=-1) at postgres.c:1816
> #11 0x007e6cb2 in PostgresMain (argc=, argv= out>, argv@entry=0x29d7820, username=0x29d75d0 "mis_ig") at postgres.c:4840
> #12 0x00799540 in BackendRun (port=0x29afc50) at postmaster.c:5915
> #13 BackendStartup (port=0x29afc50) at postmaster.c:5484
> #14 ServerLoop () at postmaster.c:2163
> #15 0x0079c309 in PostmasterMain (argc=, 
> argv=) at postmaster.c:1454
> #16 0x004a4209 in main (argc=9, argv=0x29af010) at main.c:226
> {code}
> Jumping into the frame 3 and running info locals, we found something odd for 
> "status" variable:
> {code}
> (gdb) f 3
> #3  0x0053b9c3 in SetSegnoForWrite (existing_segnos=0x4c46ff0, 
> existing_segnos@entry=0x0, relid=relid@entry=1195061, 
> segment_num=segment_num@entry=6, forNewRel=forNewRel@entry=0 '\000', 
> keepHash=keepHash@entry=1 '\001') at appendonlywriter.c:1166
> 1166  appendonlywriter.c: No such file or directory.
> (gdb) info locals
> status = 0x0
> [...]
> {code}
> This panic comes from this piece of code in "appendonlywritter.c":
> 

[jira] [Updated] (HAWQ-1408) PANICs during COPY ... FROM STDIN

2017-04-04 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1408:
--
Fix Version/s: (was: 2.1.0.0-incubating)
   2.2.0.0-incubating

> PANICs during COPY ... FROM STDIN
> -
>
> Key: HAWQ-1408
> URL: https://issues.apache.org/jira/browse/HAWQ-1408
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: backlog
>Reporter: Ming LI
>Assignee: Ming LI
> Fix For: 2.2.0.0-incubating
>
>
> We found PANIC (and respective core dumps). From the initial analysis from 
> the logs and core dump, the query causing this PANIC is a "COPY ... FROM 
> STDIN". This query does not always panic.
> This kind of queries are executed from Java/Scala code (by one of IG Spark 
> Jobs). Connection to the DB is managed by connection pool (commons-dbcp2) and 
> validated on borrow by “select 1” validation query. IG is using 
> postgresql-9.4-1206-jdbc41 as a java driver to create those connections. I 
> believe they should be using the driver from DataDirect, available in PivNet; 
> however, I haven't found hard evidence pointing the driver as a root cause.
> My initial analysis on the packcore for the master PANIC. Not sure if this 
> helps or makes sense.
> This is the backtrace of the packcore for process 466858:
> {code}
> (gdb) bt
> #0  0x7fd875f906ab in raise () from 
> /data/logs/52280/packcore-core.postgres.466858/lib64/libpthread.so.0
> #1  0x008c0b19 in SafeHandlerForSegvBusIll (postgres_signal_arg=11, 
> processName=) at elog.c:4519
> #2  
> #3  0x0053b9c3 in SetSegnoForWrite (existing_segnos=0x4c46ff0, 
> existing_segnos@entry=0x0, relid=relid@entry=1195061, 
> segment_num=segment_num@entry=6, forNewRel=forNewRel@entry=0 '\000', 
> keepHash=keepHash@entry=1 '\001') at appendonlywriter.c:1166
> #4  0x0053c08f in assignPerRelSegno 
> (all_relids=all_relids@entry=0x2b96d68, segment_num=6) at 
> appendonlywriter.c:1212
> #5  0x005f79e8 in DoCopy (stmt=stmt@entry=0x2b2a3d8, 
> queryString=) at copy.c:1591
> #6  0x007ef737 in ProcessUtility 
> (parsetree=parsetree@entry=0x2b2a3d8, queryString=0x2c2f550 "COPY 
> mis_data_ig_client_derived_attributes.client_derived_attributes_src (id, 
> tracking_id, name, value_string, value_timestamp, value_number, 
> value_boolean, environment, account, channel, device, feat"...,
> params=0x0, isTopLevel=isTopLevel@entry=1 '\001', 
> dest=dest@entry=0x2b2a7c8, completionTag=completionTag@entry=0x7ffcb5e318e0 
> "") at utility.c:1076
> #7  0x007ea95e in PortalRunUtility (portal=portal@entry=0x2b8eab0, 
> utilityStmt=utilityStmt@entry=0x2b2a3d8, isTopLevel=isTopLevel@entry=1 
> '\001', dest=dest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:1969
> #8  0x007ec13e in PortalRunMulti (portal=portal@entry=0x2b8eab0, 
> isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x2b2a7c8, 
> altdest=altdest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:2079
> #9  0x007ede95 in PortalRun (portal=portal@entry=0x2b8eab0, 
> count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
> dest=dest@entry=0x2b2a7c8, altdest=altdest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:1596
> #10 0x007e5ad9 in exec_simple_query 
> (query_string=query_string@entry=0x2b29100 "COPY 
> mis_data_ig_client_derived_attributes.client_derived_attributes_src (id, 
> tracking_id, name, value_string, value_timestamp, value_number, 
> value_boolean, environment, account, channel, device, feat"...,
> seqServerHost=seqServerHost@entry=0x0, 
> seqServerPort=seqServerPort@entry=-1) at postgres.c:1816
> #11 0x007e6cb2 in PostgresMain (argc=, argv= out>, argv@entry=0x29d7820, username=0x29d75d0 "mis_ig") at postgres.c:4840
> #12 0x00799540 in BackendRun (port=0x29afc50) at postmaster.c:5915
> #13 BackendStartup (port=0x29afc50) at postmaster.c:5484
> #14 ServerLoop () at postmaster.c:2163
> #15 0x0079c309 in PostmasterMain (argc=, 
> argv=) at postmaster.c:1454
> #16 0x004a4209 in main (argc=9, argv=0x29af010) at main.c:226
> {code}
> Jumping into the frame 3 and running info locals, we found something odd for 
> "status" variable:
> {code}
> (gdb) f 3
> #3  0x0053b9c3 in SetSegnoForWrite (existing_segnos=0x4c46ff0, 
> existing_segnos@entry=0x0, relid=relid@entry=1195061, 
> segment_num=segment_num@entry=6, forNewRel=forNewRel@entry=0 '\000', 
> keepHash=keepHash@entry=1 '\001') at appendonlywriter.c:1166
> 1166  appendonlywriter.c: No such file or directory.
> (gdb) info locals
> status = 0x0
> [...]
> {code}
> This panic comes from this piece of code in "appendonlywritter.c":
> {code}
> for 

[jira] [Updated] (HAWQ-1408) PANICs during COPY ... FROM STDIN

2017-03-24 Thread Ming LI (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming LI updated HAWQ-1408:
--
Affects Version/s: backlog

> PANICs during COPY ... FROM STDIN
> -
>
> Key: HAWQ-1408
> URL: https://issues.apache.org/jira/browse/HAWQ-1408
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Core
>Affects Versions: backlog
>Reporter: Ming LI
>Assignee: Ming LI
> Fix For: 2.1.0.0-incubating
>
>
> We found PANIC (and respective core dumps). From the initial analysis from 
> the logs and core dump, the query causing this PANIC is a "COPY ... FROM 
> STDIN". This query does not always panic.
> This kind of queries are executed from Java/Scala code (by one of IG Spark 
> Jobs). Connection to the DB is managed by connection pool (commons-dbcp2) and 
> validated on borrow by “select 1” validation query. IG is using 
> postgresql-9.4-1206-jdbc41 as a java driver to create those connections. I 
> believe they should be using the driver from DataDirect, available in PivNet; 
> however, I haven't found hard evidence pointing the driver as a root cause.
> My initial analysis on the packcore for the master PANIC. Not sure if this 
> helps or makes sense.
> This is the backtrace of the packcore for process 466858:
> {code}
> (gdb) bt
> #0  0x7fd875f906ab in raise () from 
> /data/logs/52280/packcore-core.postgres.466858/lib64/libpthread.so.0
> #1  0x008c0b19 in SafeHandlerForSegvBusIll (postgres_signal_arg=11, 
> processName=) at elog.c:4519
> #2  
> #3  0x0053b9c3 in SetSegnoForWrite (existing_segnos=0x4c46ff0, 
> existing_segnos@entry=0x0, relid=relid@entry=1195061, 
> segment_num=segment_num@entry=6, forNewRel=forNewRel@entry=0 '\000', 
> keepHash=keepHash@entry=1 '\001') at appendonlywriter.c:1166
> #4  0x0053c08f in assignPerRelSegno 
> (all_relids=all_relids@entry=0x2b96d68, segment_num=6) at 
> appendonlywriter.c:1212
> #5  0x005f79e8 in DoCopy (stmt=stmt@entry=0x2b2a3d8, 
> queryString=) at copy.c:1591
> #6  0x007ef737 in ProcessUtility 
> (parsetree=parsetree@entry=0x2b2a3d8, queryString=0x2c2f550 "COPY 
> mis_data_ig_client_derived_attributes.client_derived_attributes_src (id, 
> tracking_id, name, value_string, value_timestamp, value_number, 
> value_boolean, environment, account, channel, device, feat"...,
> params=0x0, isTopLevel=isTopLevel@entry=1 '\001', 
> dest=dest@entry=0x2b2a7c8, completionTag=completionTag@entry=0x7ffcb5e318e0 
> "") at utility.c:1076
> #7  0x007ea95e in PortalRunUtility (portal=portal@entry=0x2b8eab0, 
> utilityStmt=utilityStmt@entry=0x2b2a3d8, isTopLevel=isTopLevel@entry=1 
> '\001', dest=dest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:1969
> #8  0x007ec13e in PortalRunMulti (portal=portal@entry=0x2b8eab0, 
> isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x2b2a7c8, 
> altdest=altdest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:2079
> #9  0x007ede95 in PortalRun (portal=portal@entry=0x2b8eab0, 
> count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
> dest=dest@entry=0x2b2a7c8, altdest=altdest@entry=0x2b2a7c8, 
> completionTag=completionTag@entry=0x7ffcb5e318e0 "") at pquery.c:1596
> #10 0x007e5ad9 in exec_simple_query 
> (query_string=query_string@entry=0x2b29100 "COPY 
> mis_data_ig_client_derived_attributes.client_derived_attributes_src (id, 
> tracking_id, name, value_string, value_timestamp, value_number, 
> value_boolean, environment, account, channel, device, feat"...,
> seqServerHost=seqServerHost@entry=0x0, 
> seqServerPort=seqServerPort@entry=-1) at postgres.c:1816
> #11 0x007e6cb2 in PostgresMain (argc=, argv= out>, argv@entry=0x29d7820, username=0x29d75d0 "mis_ig") at postgres.c:4840
> #12 0x00799540 in BackendRun (port=0x29afc50) at postmaster.c:5915
> #13 BackendStartup (port=0x29afc50) at postmaster.c:5484
> #14 ServerLoop () at postmaster.c:2163
> #15 0x0079c309 in PostmasterMain (argc=, 
> argv=) at postmaster.c:1454
> #16 0x004a4209 in main (argc=9, argv=0x29af010) at main.c:226
> {code}
> Jumping into the frame 3 and running info locals, we found something odd for 
> "status" variable:
> {code}
> (gdb) f 3
> #3  0x0053b9c3 in SetSegnoForWrite (existing_segnos=0x4c46ff0, 
> existing_segnos@entry=0x0, relid=relid@entry=1195061, 
> segment_num=segment_num@entry=6, forNewRel=forNewRel@entry=0 '\000', 
> keepHash=keepHash@entry=1 '\001') at appendonlywriter.c:1166
> 1166  appendonlywriter.c: No such file or directory.
> (gdb) info locals
> status = 0x0
> [...]
> {code}
> This panic comes from this piece of code in "appendonlywritter.c":
> {code}
> for (int i = 0; i < segment_num; i++)
> {
> AOSegfileStatus