[jira] [Updated] (HBASE-18541) [C++] Segfaults from JNI

2017-09-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18541:
---
Attachment: 1-table-async-test.txt

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
> Attachments: 18541.v1.txt, 18541.v3.txt, 1-table-async-test.txt
>
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18541) [C++] Segfaults from JNI

2017-09-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18541:
---
Attachment: (was: 1-table-async-test.txt)

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
> Attachments: 18541.v1.txt, 18541.v3.txt
>
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18541) [C++] Segfaults from JNI

2017-09-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18541:
---
Attachment: 1-table-async-test.txt

Here is suggested change to async-batch-rpc-retrying-test.cc where table is 
created once at the beginning of the test.

Looped the test 11 times which all passed.

This can serve as short term fix.

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
> Attachments: 18541.v1.txt, 18541.v3.txt, 1-table-async-test.txt
>
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18541) [C++] Segfaults from JNI

2017-09-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18541:
---
Attachment: 18541.v3.txt

Tried patch v3.

Without AttachCurrentThread() / DetachCurrentThread() (i.e. lock only), I got 
crash similar to the current.

With AttachCurrentThread() / DetachCurrentThread(), I got the following:
{code}
*** SIGSEGV (@0x258) received by PID 160 (TID 0x7f8eaa021840) from PID 600; 
stack trace: ***
@ 0x7f8ea97954e2 (unknown)
@ 0x7f8ea9799939 JVM_handle_linux_signal
@ 0x7f8ea978d838 (unknown)
@ 0x7f8ea7fe33d0 (unknown)
@ 0x7f8ea9550703 (unknown)
@ 0x7f8ea9552db6 (unknown)
@   0x43f2f3 hbase::MiniCluster::GetConfValue()
@   0x439aaa hbase::TestUtil::StartMiniCluster()
@   0x59b7de 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@   0x57cbbf testing::TestCase::Run()
{code}

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
> Attachments: 18541.v1.txt, 18541.v3.txt
>
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18541) [C++] Segfaults from JNI

2017-08-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-18541:
---
Attachment: 18541.v1.txt

Tentative patch.
Use NewGlobalRef / DeleteGlobalRef to protect the cluster from being collected 
during runtime.

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
> Attachments: 18541.v1.txt
>
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)