[jira] [Updated] (HBASE-18541) [C++] Segfaults from JNI
[ https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-18541: --- Attachment: 1-table-async-test.txt > [C++] Segfaults from JNI > > > Key: HBASE-18541 > URL: https://issues.apache.org/jira/browse/HBASE-18541 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Ted Yu > Attachments: 18541.v1.txt, 18541.v3.txt, 1-table-async-test.txt > > > retry-test and multi-retry-test fails flakily when run with > {code} > buck test --all --no-results-cache > {code} > or when run in a loop: > {code} > for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break > 1; done > {code} > The problem seems to be within the JNI internals and usually happens at the > create table method call. I was not able to inspect much, but the comments in > our mini-cluster indicate that we may need to use global references instead > of local ones. I suspect the problem happens when there is a GC run for the > test since the failure happens usually after some time (but almost always in > create table method). > [~ted_yu] do you mind taking a look at this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18541) [C++] Segfaults from JNI
[ https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-18541: --- Attachment: (was: 1-table-async-test.txt) > [C++] Segfaults from JNI > > > Key: HBASE-18541 > URL: https://issues.apache.org/jira/browse/HBASE-18541 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Ted Yu > Attachments: 18541.v1.txt, 18541.v3.txt > > > retry-test and multi-retry-test fails flakily when run with > {code} > buck test --all --no-results-cache > {code} > or when run in a loop: > {code} > for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break > 1; done > {code} > The problem seems to be within the JNI internals and usually happens at the > create table method call. I was not able to inspect much, but the comments in > our mini-cluster indicate that we may need to use global references instead > of local ones. I suspect the problem happens when there is a GC run for the > test since the failure happens usually after some time (but almost always in > create table method). > [~ted_yu] do you mind taking a look at this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18541) [C++] Segfaults from JNI
[ https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-18541: --- Attachment: 1-table-async-test.txt Here is suggested change to async-batch-rpc-retrying-test.cc where table is created once at the beginning of the test. Looped the test 11 times which all passed. This can serve as short term fix. > [C++] Segfaults from JNI > > > Key: HBASE-18541 > URL: https://issues.apache.org/jira/browse/HBASE-18541 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Ted Yu > Attachments: 18541.v1.txt, 18541.v3.txt, 1-table-async-test.txt > > > retry-test and multi-retry-test fails flakily when run with > {code} > buck test --all --no-results-cache > {code} > or when run in a loop: > {code} > for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break > 1; done > {code} > The problem seems to be within the JNI internals and usually happens at the > create table method call. I was not able to inspect much, but the comments in > our mini-cluster indicate that we may need to use global references instead > of local ones. I suspect the problem happens when there is a GC run for the > test since the failure happens usually after some time (but almost always in > create table method). > [~ted_yu] do you mind taking a look at this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18541) [C++] Segfaults from JNI
[ https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-18541: --- Attachment: 18541.v3.txt Tried patch v3. Without AttachCurrentThread() / DetachCurrentThread() (i.e. lock only), I got crash similar to the current. With AttachCurrentThread() / DetachCurrentThread(), I got the following: {code} *** SIGSEGV (@0x258) received by PID 160 (TID 0x7f8eaa021840) from PID 600; stack trace: *** @ 0x7f8ea97954e2 (unknown) @ 0x7f8ea9799939 JVM_handle_linux_signal @ 0x7f8ea978d838 (unknown) @ 0x7f8ea7fe33d0 (unknown) @ 0x7f8ea9550703 (unknown) @ 0x7f8ea9552db6 (unknown) @ 0x43f2f3 hbase::MiniCluster::GetConfValue() @ 0x439aaa hbase::TestUtil::StartMiniCluster() @ 0x59b7de testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x57cbbf testing::TestCase::Run() {code} > [C++] Segfaults from JNI > > > Key: HBASE-18541 > URL: https://issues.apache.org/jira/browse/HBASE-18541 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Ted Yu > Attachments: 18541.v1.txt, 18541.v3.txt > > > retry-test and multi-retry-test fails flakily when run with > {code} > buck test --all --no-results-cache > {code} > or when run in a loop: > {code} > for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break > 1; done > {code} > The problem seems to be within the JNI internals and usually happens at the > create table method call. I was not able to inspect much, but the comments in > our mini-cluster indicate that we may need to use global references instead > of local ones. I suspect the problem happens when there is a GC run for the > test since the failure happens usually after some time (but almost always in > create table method). > [~ted_yu] do you mind taking a look at this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18541) [C++] Segfaults from JNI
[ https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-18541: --- Attachment: 18541.v1.txt Tentative patch. Use NewGlobalRef / DeleteGlobalRef to protect the cluster from being collected during runtime. > [C++] Segfaults from JNI > > > Key: HBASE-18541 > URL: https://issues.apache.org/jira/browse/HBASE-18541 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Ted Yu > Attachments: 18541.v1.txt > > > retry-test and multi-retry-test fails flakily when run with > {code} > buck test --all --no-results-cache > {code} > or when run in a loop: > {code} > for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break > 1; done > {code} > The problem seems to be within the JNI internals and usually happens at the > create table method call. I was not able to inspect much, but the comments in > our mini-cluster indicate that we may need to use global references instead > of local ones. I suspect the problem happens when there is a GC run for the > test since the failure happens usually after some time (but almost always in > create table method). > [~ted_yu] do you mind taking a look at this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)