[GitHub] incubator-hawq pull request #1257: HAWQ-1487. Fix hang process due to deadlo...

2017-06-15 Thread huor
GitHub user huor opened a pull request:

https://github.com/apache/incubator-hawq/pull/1257

HAWQ-1487. Fix hang process due to deadlock when it try to process 
interrupt in error handling



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huor/incubator-hawq interrupt

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hawq/pull/1257.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1257


commit 04773fca3790e705402a3e9a698aa65e5efb392d
Author: Ruilong Huo 
Date:   2017-06-13T10:11:01Z

HAWQ-1487. Fix hang process due to deadlock when it try to process 
interrupt in error handling




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (HAWQ-1487) hang process due to deadlock when it try to process interrupt in error handling

2017-06-15 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1487:
--
Affects Version/s: 2.2.0.0-incubating

> hang process due to deadlock when it try to process interrupt in error 
> handling
> ---
>
> Key: HAWQ-1487
> URL: https://issues.apache.org/jira/browse/HAWQ-1487
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Affects Versions: 2.2.0.0-incubating
>Reporter: Ruilong Huo
>Assignee: Ruilong Huo
> Fix For: 2.3.0.0-incubating
>
>
> It has hang process when it try to process interrupt in error handling. To be 
> specific, some QE encounter division by zero error, and then it error out. 
> During the error processing, it try to handle query cancelling interrupt and 
> thus deadlock occur.
> The hang process is:
> {noformat}
> $ hawq ssh -f hostfile -e "ps -ef | grep postgres | grep -v grep"
> gpadmin   51246  51245  0 06:15 ?00:00:01 postgres: port 20100, 
> logger p
> gpadmin   51249  51245  0 06:15 ?00:00:00 postgres: port 20100, stats 
> co
> gpadmin   51250  51245  0 06:15 ?00:00:07 postgres: port 20100, 
> writer p
> gpadmin   51251  51245  0 06:15 ?00:00:01 postgres: port 20100, 
> checkpoi
> gpadmin   51252  51245  0 06:15 ?00:00:11 postgres: port 20100, 
> segment
> gpadmin  182983  51245  0 07:00 ?00:00:03 postgres: port 20100, 
> hawqsupe
> $ ps -ef | grep postgres | grep -v grep
> gpadmin   51245  1  0 06:15 ?00:01:01 
> /usr/local/hawq_2_2_0_0/bin/postgres -D 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-Multinode-parallel/product/segmentdd
>  -i -M segment -p 20100 --silent-mode=true
> gpadmin   51246  51245  0 06:15 ?00:00:01 postgres: port 20100, 
> logger process
> gpadmin   51249  51245  0 06:15 ?00:00:00 postgres: port 20100, stats 
> collector process
> gpadmin   51250  51245  0 06:15 ?00:00:07 postgres: port 20100, 
> writer process
> gpadmin   51251  51245  0 06:15 ?00:00:01 postgres: port 20100, 
> checkpoint process
> gpadmin   51252  51245  0 06:15 ?00:00:11 postgres: port 20100, 
> segment resource manager
> gpadmin  182983  51245  0 07:00 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_winow... 10.32.34.225(45462) con4405 seg0 cmd2 slice7 
> MPPEXEC SELECT
> gpadmin  194424 194402  0 23:50 pts/000:00:00 grep postgres
> {noformat}
> The call stack is:
> {noformat}
> $ sudo gdb -p 182983
> (gdb) bt
> #0  0x003ff060e2e4 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x003ff0609588 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x003ff0609457 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x003ff221206a in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
> #4  0x003ff220f603 in ?? () from /lib64/libgcc_s.so.1
> #5  0x003ff220ff49 in ?? () from /lib64/libgcc_s.so.1
> #6  0x003ff22100e7 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #7  0x003ff02fe966 in backtrace () from /lib64/libc.so.6
> #8  0x009cda3f in errstart (elevel=20, filename=0xd309e0 
> "postgres.c", lineno=3618,
> funcname=0xd32fc0 "ProcessInterrupts", domain=0x0) at elog.c:492
> #9  0x008e8fcb in ProcessInterrupts () at postgres.c:3616
> #10 0x008e8c9e in StatementCancelHandler (postgres_signal_arg=2) at 
> postgres.c:3463
> #11 
> #12 0x003ff0609451 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #13 0x003ff221206a in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
> #14 0x003ff220f603 in ?? () from /lib64/libgcc_s.so.1
> #15 0x003ff2210119 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #16 0x003ff02fe966 in backtrace () from /lib64/libc.so.6
> #17 0x009cda3f in errstart (elevel=20, filename=0xd3ba00 "float.c", 
> lineno=839, funcname=0xd3bf3a "float8div",
> domain=0x0) at elog.c:492
> #18 0x00921a84 in float8div (fcinfo=0x7ffd04d2b8b0) at float.c:836
> #19 0x00722fe5 in ExecMakeFunctionResult (fcache=0x324a088, 
> econtext=0x32495d8, isNull=0x7ffd04d2c0e0 "\030",
> isDone=0x7ffd04d2bd04) at execQual.c:1762
> #20 0x00723d87 in ExecEvalOper (fcache=0x324a088, econtext=0x32495d8, 
> isNull=0x7ffd04d2c0e0 "\030",
> isDone=0x7ffd04d2bd04) at execQual.c:2250
> #21 0x00722451 in ExecEvalFuncArgs (fcinfo=0x7ffd04d2bda0, 
> argList=0x324b378, econtext=0x32495d8) at execQual.c:1317
> #22 0x00722a68 in ExecMakeFunctionResult (fcache=0x3249850, 
> econtext=0x32495d8,
> isNull=0x7ffd04d2c5c1 "\306\322\004\375\177", isDone=0x0) at 
> execQual.c:1532
> #23 0x00723d1e in ExecEvalFunc (fcache=0x3249850, econtext=0x32495d8, 
> isNull=0x7ffd04d2c5c1 "\306\322\004\375\177",
> isDone=0x0) at 

[jira] [Updated] (HAWQ-1487) hang process due to deadlock when it try to process interrupt in error handling

2017-06-15 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1487:
--
Fix Version/s: 2.3.0.0-incubating

> hang process due to deadlock when it try to process interrupt in error 
> handling
> ---
>
> Key: HAWQ-1487
> URL: https://issues.apache.org/jira/browse/HAWQ-1487
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Reporter: Ruilong Huo
>Assignee: Ruilong Huo
> Fix For: 2.3.0.0-incubating
>
>
> It has hang process when it try to process interrupt in error handling. To be 
> specific, some QE encounter division by zero error, and then it error out. 
> During the error processing, it try to handle query cancelling interrupt and 
> thus deadlock occur.
> The hang process is:
> {noformat}
> $ hawq ssh -f hostfile -e "ps -ef | grep postgres | grep -v grep"
> gpadmin   51246  51245  0 06:15 ?00:00:01 postgres: port 20100, 
> logger p
> gpadmin   51249  51245  0 06:15 ?00:00:00 postgres: port 20100, stats 
> co
> gpadmin   51250  51245  0 06:15 ?00:00:07 postgres: port 20100, 
> writer p
> gpadmin   51251  51245  0 06:15 ?00:00:01 postgres: port 20100, 
> checkpoi
> gpadmin   51252  51245  0 06:15 ?00:00:11 postgres: port 20100, 
> segment
> gpadmin  182983  51245  0 07:00 ?00:00:03 postgres: port 20100, 
> hawqsupe
> $ ps -ef | grep postgres | grep -v grep
> gpadmin   51245  1  0 06:15 ?00:01:01 
> /usr/local/hawq_2_2_0_0/bin/postgres -D 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-Multinode-parallel/product/segmentdd
>  -i -M segment -p 20100 --silent-mode=true
> gpadmin   51246  51245  0 06:15 ?00:00:01 postgres: port 20100, 
> logger process
> gpadmin   51249  51245  0 06:15 ?00:00:00 postgres: port 20100, stats 
> collector process
> gpadmin   51250  51245  0 06:15 ?00:00:07 postgres: port 20100, 
> writer process
> gpadmin   51251  51245  0 06:15 ?00:00:01 postgres: port 20100, 
> checkpoint process
> gpadmin   51252  51245  0 06:15 ?00:00:11 postgres: port 20100, 
> segment resource manager
> gpadmin  182983  51245  0 07:00 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_winow... 10.32.34.225(45462) con4405 seg0 cmd2 slice7 
> MPPEXEC SELECT
> gpadmin  194424 194402  0 23:50 pts/000:00:00 grep postgres
> {noformat}
> The call stack is:
> {noformat}
> $ sudo gdb -p 182983
> (gdb) bt
> #0  0x003ff060e2e4 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x003ff0609588 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x003ff0609457 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x003ff221206a in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
> #4  0x003ff220f603 in ?? () from /lib64/libgcc_s.so.1
> #5  0x003ff220ff49 in ?? () from /lib64/libgcc_s.so.1
> #6  0x003ff22100e7 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #7  0x003ff02fe966 in backtrace () from /lib64/libc.so.6
> #8  0x009cda3f in errstart (elevel=20, filename=0xd309e0 
> "postgres.c", lineno=3618,
> funcname=0xd32fc0 "ProcessInterrupts", domain=0x0) at elog.c:492
> #9  0x008e8fcb in ProcessInterrupts () at postgres.c:3616
> #10 0x008e8c9e in StatementCancelHandler (postgres_signal_arg=2) at 
> postgres.c:3463
> #11 
> #12 0x003ff0609451 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #13 0x003ff221206a in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
> #14 0x003ff220f603 in ?? () from /lib64/libgcc_s.so.1
> #15 0x003ff2210119 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #16 0x003ff02fe966 in backtrace () from /lib64/libc.so.6
> #17 0x009cda3f in errstart (elevel=20, filename=0xd3ba00 "float.c", 
> lineno=839, funcname=0xd3bf3a "float8div",
> domain=0x0) at elog.c:492
> #18 0x00921a84 in float8div (fcinfo=0x7ffd04d2b8b0) at float.c:836
> #19 0x00722fe5 in ExecMakeFunctionResult (fcache=0x324a088, 
> econtext=0x32495d8, isNull=0x7ffd04d2c0e0 "\030",
> isDone=0x7ffd04d2bd04) at execQual.c:1762
> #20 0x00723d87 in ExecEvalOper (fcache=0x324a088, econtext=0x32495d8, 
> isNull=0x7ffd04d2c0e0 "\030",
> isDone=0x7ffd04d2bd04) at execQual.c:2250
> #21 0x00722451 in ExecEvalFuncArgs (fcinfo=0x7ffd04d2bda0, 
> argList=0x324b378, econtext=0x32495d8) at execQual.c:1317
> #22 0x00722a68 in ExecMakeFunctionResult (fcache=0x3249850, 
> econtext=0x32495d8,
> isNull=0x7ffd04d2c5c1 "\306\322\004\375\177", isDone=0x0) at 
> execQual.c:1532
> #23 0x00723d1e in ExecEvalFunc (fcache=0x3249850, econtext=0x32495d8, 
> isNull=0x7ffd04d2c5c1 "\306\322\004\375\177",
> isDone=0x0) at execQual.c:2228
> #24 0x0076eed2 in initFcinfo 

[jira] [Assigned] (HAWQ-1487) hang process due to deadlock when it try to process interrupt in error handling

2017-06-15 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo reassigned HAWQ-1487:
-

Assignee: Ruilong Huo  (was: Lei Chang)

> hang process due to deadlock when it try to process interrupt in error 
> handling
> ---
>
> Key: HAWQ-1487
> URL: https://issues.apache.org/jira/browse/HAWQ-1487
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Query Execution
>Reporter: Ruilong Huo
>Assignee: Ruilong Huo
> Fix For: 2.3.0.0-incubating
>
>
> It has hang process when it try to process interrupt in error handling. To be 
> specific, some QE encounter division by zero error, and then it error out. 
> During the error processing, it try to handle query cancelling interrupt and 
> thus deadlock occur.
> The hang process is:
> {noformat}
> $ hawq ssh -f hostfile -e "ps -ef | grep postgres | grep -v grep"
> gpadmin   51246  51245  0 06:15 ?00:00:01 postgres: port 20100, 
> logger p
> gpadmin   51249  51245  0 06:15 ?00:00:00 postgres: port 20100, stats 
> co
> gpadmin   51250  51245  0 06:15 ?00:00:07 postgres: port 20100, 
> writer p
> gpadmin   51251  51245  0 06:15 ?00:00:01 postgres: port 20100, 
> checkpoi
> gpadmin   51252  51245  0 06:15 ?00:00:11 postgres: port 20100, 
> segment
> gpadmin  182983  51245  0 07:00 ?00:00:03 postgres: port 20100, 
> hawqsupe
> $ ps -ef | grep postgres | grep -v grep
> gpadmin   51245  1  0 06:15 ?00:01:01 
> /usr/local/hawq_2_2_0_0/bin/postgres -D 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-Multinode-parallel/product/segmentdd
>  -i -M segment -p 20100 --silent-mode=true
> gpadmin   51246  51245  0 06:15 ?00:00:01 postgres: port 20100, 
> logger process
> gpadmin   51249  51245  0 06:15 ?00:00:00 postgres: port 20100, stats 
> collector process
> gpadmin   51250  51245  0 06:15 ?00:00:07 postgres: port 20100, 
> writer process
> gpadmin   51251  51245  0 06:15 ?00:00:01 postgres: port 20100, 
> checkpoint process
> gpadmin   51252  51245  0 06:15 ?00:00:11 postgres: port 20100, 
> segment resource manager
> gpadmin  182983  51245  0 07:00 ?00:00:03 postgres: port 20100, 
> hawqsuperuser olap_winow... 10.32.34.225(45462) con4405 seg0 cmd2 slice7 
> MPPEXEC SELECT
> gpadmin  194424 194402  0 23:50 pts/000:00:00 grep postgres
> {noformat}
> The call stack is:
> {noformat}
> $ sudo gdb -p 182983
> (gdb) bt
> #0  0x003ff060e2e4 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x003ff0609588 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x003ff0609457 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x003ff221206a in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
> #4  0x003ff220f603 in ?? () from /lib64/libgcc_s.so.1
> #5  0x003ff220ff49 in ?? () from /lib64/libgcc_s.so.1
> #6  0x003ff22100e7 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #7  0x003ff02fe966 in backtrace () from /lib64/libc.so.6
> #8  0x009cda3f in errstart (elevel=20, filename=0xd309e0 
> "postgres.c", lineno=3618,
> funcname=0xd32fc0 "ProcessInterrupts", domain=0x0) at elog.c:492
> #9  0x008e8fcb in ProcessInterrupts () at postgres.c:3616
> #10 0x008e8c9e in StatementCancelHandler (postgres_signal_arg=2) at 
> postgres.c:3463
> #11 
> #12 0x003ff0609451 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #13 0x003ff221206a in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
> #14 0x003ff220f603 in ?? () from /lib64/libgcc_s.so.1
> #15 0x003ff2210119 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #16 0x003ff02fe966 in backtrace () from /lib64/libc.so.6
> #17 0x009cda3f in errstart (elevel=20, filename=0xd3ba00 "float.c", 
> lineno=839, funcname=0xd3bf3a "float8div",
> domain=0x0) at elog.c:492
> #18 0x00921a84 in float8div (fcinfo=0x7ffd04d2b8b0) at float.c:836
> #19 0x00722fe5 in ExecMakeFunctionResult (fcache=0x324a088, 
> econtext=0x32495d8, isNull=0x7ffd04d2c0e0 "\030",
> isDone=0x7ffd04d2bd04) at execQual.c:1762
> #20 0x00723d87 in ExecEvalOper (fcache=0x324a088, econtext=0x32495d8, 
> isNull=0x7ffd04d2c0e0 "\030",
> isDone=0x7ffd04d2bd04) at execQual.c:2250
> #21 0x00722451 in ExecEvalFuncArgs (fcinfo=0x7ffd04d2bda0, 
> argList=0x324b378, econtext=0x32495d8) at execQual.c:1317
> #22 0x00722a68 in ExecMakeFunctionResult (fcache=0x3249850, 
> econtext=0x32495d8,
> isNull=0x7ffd04d2c5c1 "\306\322\004\375\177", isDone=0x0) at 
> execQual.c:1532
> #23 0x00723d1e in ExecEvalFunc (fcache=0x3249850, econtext=0x32495d8, 
> isNull=0x7ffd04d2c5c1 "\306\322\004\375\177",
> isDone=0x0) at execQual.c:2228
> #24 0x0076eed2 in 

[jira] [Updated] (HAWQ-1487) hang process due to deadlock when it try to process interrupt in error handling

2017-06-15 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-1487:
--
Description: 
It has hang process when it try to process interrupt in error handling. To be 
specific, some QE encounter division by zero error, and then it error out. 
During the error processing, it try to handle query cancelling interrupt and 
thus deadlock occur.

The hang process is:
{noformat}
$ hawq ssh -f hostfile -e "ps -ef | grep postgres | grep -v grep"
gpadmin   51246  51245  0 06:15 ?00:00:01 postgres: port 20100, logger p
gpadmin   51249  51245  0 06:15 ?00:00:00 postgres: port 20100, stats co
gpadmin   51250  51245  0 06:15 ?00:00:07 postgres: port 20100, writer p
gpadmin   51251  51245  0 06:15 ?00:00:01 postgres: port 20100, checkpoi
gpadmin   51252  51245  0 06:15 ?00:00:11 postgres: port 20100, segment
gpadmin  182983  51245  0 07:00 ?00:00:03 postgres: port 20100, hawqsupe

$ ps -ef | grep postgres | grep -v grep
gpadmin   51245  1  0 06:15 ?00:01:01 
/usr/local/hawq_2_2_0_0/bin/postgres -D 
/data/pulse-agent-data/HAWQ-main-FeatureTest-opt-Multinode-parallel/product/segmentdd
 -i -M segment -p 20100 --silent-mode=true
gpadmin   51246  51245  0 06:15 ?00:00:01 postgres: port 20100, logger 
process
gpadmin   51249  51245  0 06:15 ?00:00:00 postgres: port 20100, stats 
collector process
gpadmin   51250  51245  0 06:15 ?00:00:07 postgres: port 20100, writer 
process
gpadmin   51251  51245  0 06:15 ?00:00:01 postgres: port 20100, 
checkpoint process
gpadmin   51252  51245  0 06:15 ?00:00:11 postgres: port 20100, segment 
resource manager
gpadmin  182983  51245  0 07:00 ?00:00:03 postgres: port 20100, 
hawqsuperuser olap_winow... 10.32.34.225(45462) con4405 seg0 cmd2 slice7 
MPPEXEC SELECT
gpadmin  194424 194402  0 23:50 pts/000:00:00 grep postgres
{noformat}

The call stack is:
{noformat}
$ sudo gdb -p 182983
(gdb) bt
#0  0x003ff060e2e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x003ff0609588 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x003ff0609457 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x003ff221206a in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
#4  0x003ff220f603 in ?? () from /lib64/libgcc_s.so.1
#5  0x003ff220ff49 in ?? () from /lib64/libgcc_s.so.1
#6  0x003ff22100e7 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#7  0x003ff02fe966 in backtrace () from /lib64/libc.so.6
#8  0x009cda3f in errstart (elevel=20, filename=0xd309e0 "postgres.c", 
lineno=3618,
funcname=0xd32fc0 "ProcessInterrupts", domain=0x0) at elog.c:492
#9  0x008e8fcb in ProcessInterrupts () at postgres.c:3616
#10 0x008e8c9e in StatementCancelHandler (postgres_signal_arg=2) at 
postgres.c:3463
#11 
#12 0x003ff0609451 in pthread_mutex_lock () from /lib64/libpthread.so.0
#13 0x003ff221206a in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
#14 0x003ff220f603 in ?? () from /lib64/libgcc_s.so.1
#15 0x003ff2210119 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#16 0x003ff02fe966 in backtrace () from /lib64/libc.so.6
#17 0x009cda3f in errstart (elevel=20, filename=0xd3ba00 "float.c", 
lineno=839, funcname=0xd3bf3a "float8div",
domain=0x0) at elog.c:492
#18 0x00921a84 in float8div (fcinfo=0x7ffd04d2b8b0) at float.c:836
#19 0x00722fe5 in ExecMakeFunctionResult (fcache=0x324a088, 
econtext=0x32495d8, isNull=0x7ffd04d2c0e0 "\030",
isDone=0x7ffd04d2bd04) at execQual.c:1762
#20 0x00723d87 in ExecEvalOper (fcache=0x324a088, econtext=0x32495d8, 
isNull=0x7ffd04d2c0e0 "\030",
isDone=0x7ffd04d2bd04) at execQual.c:2250
#21 0x00722451 in ExecEvalFuncArgs (fcinfo=0x7ffd04d2bda0, 
argList=0x324b378, econtext=0x32495d8) at execQual.c:1317
#22 0x00722a68 in ExecMakeFunctionResult (fcache=0x3249850, 
econtext=0x32495d8,
isNull=0x7ffd04d2c5c1 "\306\322\004\375\177", isDone=0x0) at execQual.c:1532
#23 0x00723d1e in ExecEvalFunc (fcache=0x3249850, econtext=0x32495d8, 
isNull=0x7ffd04d2c5c1 "\306\322\004\375\177",
isDone=0x0) at execQual.c:2228
#24 0x0076eed2 in initFcinfo (wrxstate=0x31b8fe0, 
fcinfo=0x7ffd04d2c280, funcstate=0x7f83c7412318, econtext=0x32495d8,
check_nulls=1 '\001') at nodeWindow.c:3201
#25 0x0076efa4 in add_tuple_to_trans (funcstate=0x7f83c7412318, 
wstate=0x3248ab8, econtext=0x32495d8,
check_nulls=1 '\001') at nodeWindow.c:3223
#26 0x00772f72 in processTupleSlot (wstate=0x3248ab8, slot=0x31ac150, 
last_peer=0 '\000') at nodeWindow.c:5105
#27 0x00772760 in ExecWindow (wstate=0x3248ab8) at nodeWindow.c:4821
---Type  to continue, or q  to quit---
#28 0x0071eda7 in ExecProcNode (node=0x3248ab8) at execProcnode.c:1007
#29 0x0075aded in NextInputSlot (node=0x31af928) at nodeResult.c:95

[jira] [Created] (HAWQ-1487) hang process due to deadlock when it try to process interrupt in error handling

2017-06-15 Thread Ruilong Huo (JIRA)
Ruilong Huo created HAWQ-1487:
-

 Summary: hang process due to deadlock when it try to process 
interrupt in error handling
 Key: HAWQ-1487
 URL: https://issues.apache.org/jira/browse/HAWQ-1487
 Project: Apache HAWQ
  Issue Type: Bug
  Components: Query Execution
Reporter: Ruilong Huo
Assignee: Lei Chang


It has hang process when it try to process interrupt in error handling. To be 
specific, some QE encounter division by zero error, and then it error out. 
During the error processing, it try to handle query cancelling interrupt and 
thus deadlock occur.

The hang process is:
{noformat}
$ hawq ssh -f hostfile -e "ps -ef | grep postgres | grep -v grep"
gpadmin   51246  51245  0 06:15 ?00:00:01 postgres: port 20100, logger p
gpadmin   51249  51245  0 06:15 ?00:00:00 postgres: port 20100, stats co
gpadmin   51250  51245  0 06:15 ?00:00:07 postgres: port 20100, writer p
gpadmin   51251  51245  0 06:15 ?00:00:01 postgres: port 20100, checkpoi
gpadmin   51252  51245  0 06:15 ?00:00:11 postgres: port 20100, segment
gpadmin  182983  51245  0 07:00 ?00:00:03 postgres: port 20100, hawqsupe

$ ps -ef | grep postgres | grep -v grep
gpadmin   51245  1  0 06:15 ?00:01:01 
/usr/local/hawq_2_2_0_0/bin/postgres -D 
/data/pulse-agent-data/HAWQ-main-FeatureTest-opt-Multinode-parallel/product/segmentdd
 -i -M segment -p 20100 --silent-mode=true
gpadmin   51246  51245  0 06:15 ?00:00:01 postgres: port 20100, logger 
process
gpadmin   51249  51245  0 06:15 ?00:00:00 postgres: port 20100, stats 
collector process
gpadmin   51250  51245  0 06:15 ?00:00:07 postgres: port 20100, writer 
process
gpadmin   51251  51245  0 06:15 ?00:00:01 postgres: port 20100, 
checkpoint process
gpadmin   51252  51245  0 06:15 ?00:00:11 postgres: port 20100, segment 
resource manager
gpadmin  182983  51245  0 07:00 ?00:00:03 postgres: port 20100, 
hawqsuperuser olap_winow... 10.32.34.225(45462) con4405 seg0 cmd2 slice7 
MPPEXEC SELECT
gpadmin  194424 194402  0 23:50 pts/000:00:00 grep postgres
{noformat}

The call stack is:
{noformat}
$ sudo gdb -p 182983
(gdb) bt
#0  0x003ff060e2e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x003ff0609588 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x003ff0609457 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x003ff221206a in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
#4  0x003ff220f603 in ?? () from /lib64/libgcc_s.so.1
#5  0x003ff220ff49 in ?? () from /lib64/libgcc_s.so.1
#6  0x003ff22100e7 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#7  0x003ff02fe966 in backtrace () from /lib64/libc.so.6
#8  0x009cda3f in errstart (elevel=20, filename=0xd309e0 "postgres.c", 
lineno=3618,
funcname=0xd32fc0 "ProcessInterrupts", domain=0x0) at elog.c:492
#9  0x008e8fcb in ProcessInterrupts () at postgres.c:3616
#10 0x008e8c9e in StatementCancelHandler (postgres_signal_arg=2) at 
postgres.c:3463
#11 
#12 0x003ff0609451 in pthread_mutex_lock () from /lib64/libpthread.so.0
#13 0x003ff221206a in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
#14 0x003ff220f603 in ?? () from /lib64/libgcc_s.so.1
#15 0x003ff2210119 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#16 0x003ff02fe966 in backtrace () from /lib64/libc.so.6
#17 0x009cda3f in errstart (elevel=20, filename=0xd3ba00 "float.c", 
lineno=839, funcname=0xd3bf3a "float8div",
domain=0x0) at elog.c:492
#18 0x00921a84 in float8div (fcinfo=0x7ffd04d2b8b0) at float.c:836
#19 0x00722fe5 in ExecMakeFunctionResult (fcache=0x324a088, 
econtext=0x32495d8, isNull=0x7ffd04d2c0e0 "\030",
isDone=0x7ffd04d2bd04) at execQual.c:1762
#20 0x00723d87 in ExecEvalOper (fcache=0x324a088, econtext=0x32495d8, 
isNull=0x7ffd04d2c0e0 "\030",
isDone=0x7ffd04d2bd04) at execQual.c:2250
#21 0x00722451 in ExecEvalFuncArgs (fcinfo=0x7ffd04d2bda0, 
argList=0x324b378, econtext=0x32495d8) at execQual.c:1317
#22 0x00722a68 in ExecMakeFunctionResult (fcache=0x3249850, 
econtext=0x32495d8,
isNull=0x7ffd04d2c5c1 "\306\322\004\375\177", isDone=0x0) at execQual.c:1532
#23 0x00723d1e in ExecEvalFunc (fcache=0x3249850, econtext=0x32495d8, 
isNull=0x7ffd04d2c5c1 "\306\322\004\375\177",
isDone=0x0) at execQual.c:2228
#24 0x0076eed2 in initFcinfo (wrxstate=0x31b8fe0, 
fcinfo=0x7ffd04d2c280, funcstate=0x7f83c7412318, econtext=0x32495d8,
check_nulls=1 '\001') at nodeWindow.c:3201
#25 0x0076efa4 in add_tuple_to_trans (funcstate=0x7f83c7412318, 
wstate=0x3248ab8, econtext=0x32495d8,
check_nulls=1 '\001') at nodeWindow.c:3223
#26 0x00772f72 in processTupleSlot (wstate=0x3248ab8, slot=0x31ac150, 
last_peer=0 '\000') at nodeWindow.c:5105
#27 0x00772760 in ExecWindow 

[jira] [Closed] (HAWQ-1485) Use user/password instead of credentials cache in Ranger lookup for HAWQ with Kerberos enabled.

2017-06-15 Thread Hongxu Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongxu Ma closed HAWQ-1485.
---
Resolution: Fixed

fixed

> Use user/password instead of credentials cache in Ranger lookup for HAWQ with 
> Kerberos enabled.
> ---
>
> Key: HAWQ-1485
> URL: https://issues.apache.org/jira/browse/HAWQ-1485
> Project: Apache HAWQ
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Hongxu Ma
>Assignee: Hongxu Ma
> Fix For: 2.3.0.0-incubating
>
>
> When used credentials cache:
> Try error password in Ranger UI doesn't destroy the existed kerberos 
> credentials (created by last success kinit command)
> It's a strange behavior to user.
> So we should use user/password for kerberos authentication.
> Core logic:
> {code}
> Properties props = new Properties();
> if (connectionProperties.containsKey(AUTHENTICATION) && 
> connectionProperties.get(AUTHENTICATION).equals(KERBEROS)) {
> //kerberos mode
> props.setProperty("kerberosServerName", 
> connectionProperties.get("principal"));
> props.setProperty("jaasApplicationName", "pgjdbc");
> }
> String url = String.format("jdbc:postgresql://%s:%s/%s", 
> connectionProperties.get("hostname"), connectionProperties.get("port"), db);
> props.setProperty("user", connectionProperties.get("username"));
> props.setProperty("password", connectionProperties.get("password"));
> return DriverManager.getConnection(url, props);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HAWQ-1484) Spin PXF into a Separate Project for Data Access

2017-06-15 Thread Shivram Mani (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051065#comment-16051065
 ] 

Shivram Mani commented on HAWQ-1484:


[~sirinath] when you say separate project, do you mean a separate top level 
project with its own repository ?

> Spin PXF into a Separate Project for Data Access
> 
>
> Key: HAWQ-1484
> URL: https://issues.apache.org/jira/browse/HAWQ-1484
> Project: Apache HAWQ
>  Issue Type: New Feature
>Reporter: Suminda Dharmasena
>Assignee: Radar Lei
>
> Can the PXF be spinned into a seperate projects here they can be used as a 
> basis for other data access projects.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HAWQ-1486) PANIC accessing PXF HDFS table

2017-06-15 Thread Oleksandr Diachenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Diachenko resolved HAWQ-1486.
---
Resolution: Fixed

Merged to master.

> PANIC accessing PXF HDFS table
> --
>
> Key: HAWQ-1486
> URL: https://issues.apache.org/jira/browse/HAWQ-1486
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: External Tables, PXF
>Reporter: John Gaskin
>Assignee: Vineet Goel
> Fix For: 2.3.0.0-incubating
>
>
> This code doesn't catch the case when churl_init_download() returns NULL. 
> This seems to trigger a segfault at libcurl level.
> {code}
> Looks like we failed to connect to PXF (?).
> Piece of code in HAWQ handling cUrl calls (pxfutils.c):
>  100 static void process_request(ClientContext* client_context, char *uri)
>  101 {
>  102 size_t n = 0;
>  103 char buffer[RAW_BUF_SIZE];
>  104
>  105 print_http_headers(client_context->http_headers);
>  106 client_context->handle = churl_init_download(uri, 
> client_context->http_headers);
>  107 memset(buffer, 0, RAW_BUF_SIZE);
>  108 resetStringInfo(&(client_context->the_rest_buf));
>  109
>  110 /*
>  111  * This try-catch ensures that in case of an exception during the 
> "communication with PXF and the accumulation of
>  112  * PXF data in client_context->the_rest_buf", we still get to 
> terminate the libcurl connection nicely and avoid
>  113  * leaving the PXF server connection hung.
>  114  */
>  115 PG_TRY();
>  116 {
>  117 /* read some bytes to make sure the connection is established */
>  118 churl_read_check_connectivity(client_context->handle);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] incubator-hawq issue #1255: HAWQ-1486. Catch error out on NULL condition for...

2017-06-15 Thread sansanichfb
Github user sansanichfb commented on the issue:

https://github.com/apache/incubator-hawq/pull/1255
  
Merged code to master, feel free to close PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq pull request #1256: HAWQ-1485. Fix exception of decryptPasswo...

2017-06-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-hawq/pull/1256


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hawq issue #1256: HAWQ-1485. Fix exception of decryptPassword twic...

2017-06-15 Thread linwen
Github user linwen commented on the issue:

https://github.com/apache/incubator-hawq/pull/1256
  
LGTM 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---