[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-09-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158032#comment-16158032
 ] 

Ted Yu commented on HBASE-18541:


Integrated the 1 table patch thru HBASE-18777.

This JIRA has much background information - let's keep it open when we hunt for 
root cause.

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
> Attachments: 18541.v1.txt, 18541.v3.txt, 1-table-async-test.txt
>
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-09-07 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158016#comment-16158016
 ] 

Enis Soztutar commented on HBASE-18541:
---

Thanks for the patch. A short term fix is acceptable, since we seem to be 
having some trouble finding the actual issue in the JNI layer. Can you please 
remove commented out code. 
{code}
-  auto tableName = createTestTable(split_regions, table_name);
+// auto tableName = createTestTable(split_regions, table_name);
{code}
Do we need a similar change in regular retry-test as well? 

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
> Attachments: 18541.v1.txt, 18541.v3.txt, 1-table-async-test.txt
>
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149659#comment-16149659
 ] 

Ted Yu commented on HBASE-18541:


After changing hadoop version to 2.7.4 , the loop of tests seems more stable.

[~enis]:
Mind giving it a try ?

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
> Attachments: 18541.v1.txt
>
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149555#comment-16149555
 ] 

Ted Yu commented on HBASE-18541:


Applied -XX:+UseMembar flag to the JVM created by unit test.

The crash still happens.

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
> Attachments: 18541.v1.txt
>
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-23 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139331#comment-16139331
 ] 

Ted Yu commented on HBASE-18541:


Got a crash by uncommenting the last test in async-batch-rpc-retrying-test.cc :
{code}
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f2614cb10a1 in ?? () from 
/usr/lib/jvm/java-8-openjdk-amd64//jre/lib/amd64/server/libjvm.so
[Current thread is 1 (Thread 0x7f2615553840 (LWP 8413))]
(gdb) bt
#0  0x7f2614cb10a1 in ?? () from 
/usr/lib/jvm/java-8-openjdk-amd64//jre/lib/amd64/server/libjvm.so
#1  0x7f2614ddf01f in ?? () from 
/usr/lib/jvm/java-8-openjdk-amd64//jre/lib/amd64/server/libjvm.so
#2  0x7f2614ab70e3 in JVM_MonitorWait () from 
/usr/lib/jvm/java-8-openjdk-amd64//jre/lib/amd64/server/libjvm.so
#3  0x7f26015e82e8 in ?? ()
#4  0xf78a5f18 in ?? ()
#5  0x7fff1483a7e0 in ?? ()
#6  0x7f26003cda00 in ?? ()
#7  0xea60 in ?? ()
{code}
However, after installing openjdk-8-dbg , 10 test runs passed.

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
> Attachments: 18541.v1.txt
>
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-23 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139173#comment-16139173
 ] 

Ted Yu commented on HBASE-18541:


Ran the command given in description, the test passed 10 times (without patch).

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
> Attachments: 18541.v1.txt
>
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-15 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128050#comment-16128050
 ] 

Ted Yu commented on HBASE-18541:


Logged the following:

https://bugs.launchpad.net/bugs/1710674

Provided some information requested by Tiago

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125998#comment-16125998
 ] 

Ted Yu commented on HBASE-18541:


This instance was from netty :
{code}
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f488085a69e in ?? ()
[Current thread is 1 (Thread 0x7f48947fa840 (LWP 6965))]
Installing openjdk unwinder
(gdb) bt
#0  0x7f488085a69e in  ()
#1  0x7f4880729d80 in [interpreted: bc = 20] 
io.netty.channel.nio.NioEventLoop.wakeup(boolean) () at 
io/netty/channel/nio/NioEventLoop.java:645
#2  0x7f4880729ffd in [interpreted: bc = 75] 
io.netty.util.concurrent.SingleThreadEventExecutor.execute(java.lang.Runnable) 
()
at io/netty/util/concurrent/SingleThreadEventExecutor.java:681
#3  0x7f488072a042 in [interpreted: bc = 51] 
org.apache.hadoop.hbase.ipc.AsyncRpcChannelImpl.close(java.lang.Throwable) ()
at org/apache/hadoop/hbase/ipc/AsyncRpcChannelImpl.java:596
#4  0x7f488072a042 in [interpreted: bc = 77] 
org.apache.hadoop.hbase.ipc.AsyncRpcClient.close() () at 
org/apache/hadoop/hbase/ipc/AsyncRpcClient.java:346
#5  0x7f488072a042 in [interpreted: bc = 71] 
org.apache.hadoop.hbase.client.ConnectionImplementation.close() ()
at org/apache/hadoop/hbase/client/ConnectionImplementation.java:1911
#6  0x7f488072a042 in [interpreted: bc = 33] 
org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster() () at 
org/apache/hadoop/hbase/HBaseTestingUtility.java:1166
#7  0x7f48807224e7 in StubRoutines (1) ()
{code}

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125890#comment-16125890
 ] 

Ted Yu commented on HBASE-18541:


Another instance of segfault:
{code}
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7fb387315dc8 in os::write_memory_serialize_page (thread=0x2af3000) at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/os.hpp:419
419 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/os.hpp:
 No such file or directory.
[Current thread is 1 (Thread 0x7fb387dbe840 (LWP 9221))]
Installing openjdk unwinder
(gdb) bt
#0  0x7fb387315dc8 in 
ThreadStateTransition::transition_and_fence(JavaThread*, JavaThreadState, 
JavaThreadState) (thread=0x2af3000)
at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/os.hpp:419
#1  0x7fb387315dc8 in 
ThreadStateTransition::transition_and_fence(JavaThread*, JavaThreadState, 
JavaThreadState) (thread=0x2af3000)
at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/os/linux/vm/interfaceSupport_linux.hpp:31
#2  0x7fb387315dc8 in 
ThreadStateTransition::transition_and_fence(JavaThread*, JavaThreadState, 
JavaThreadState) (thread=thread@entry=0x2af3000, to=_thread_in_native, 
from=_thread_in_vm) at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/interfaceSupport.hpp:179
#3  0x7fb38731719f in JVM_FillInStackTrace(JNIEnv*, jobject) 
(to=_thread_in_native, from=_thread_in_vm, this=)
at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/interfaceSupport.hpp:232
#4  0x7fb38731719f in JVM_FillInStackTrace(JNIEnv*, jobject) 
(this=, __in_chrg=)
at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/interfaceSupport.hpp:281
#5  0x7fb38731719f in JVM_FillInStackTrace(JNIEnv*, jobject) 
(env=, receiver=receiver@entry=0x7ffde93448a0)
at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/prims/jvm.cpp:516
#6  0x7fb38395e851 in Java_java_lang_Throwable_fillInStackTrace 
(env=, throwable=0x7ffde93448a0, dummy=)
at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/jdk/src/share/native/java/lang/Throwable.c:49
#7  0x7fb373eb9a28 in [native offset=0xa8] 
java.lang.Throwable.fillInStackTrace(int) () at java/lang/Throwable.java
#8  0x7fb3743472a4 in [compiled offset=0x84] 
java.lang.Throwable.fillInStackTrace() () at java/lang/Throwable.java:781
#9  0x7fb3743bc914 in [compiled offset=0x194] java.lang.Throwable.() 
() at java/lang/Throwable.java:249
#10 0x7fb37421a0d4 in [compiled offset=0x1b4] 
org.apache.log4j.helpers.PatternParser$LocationPatternConverter.convert(org.apache.log4j.spi.LoggingEvent)
 ()
at org/apache/log4j/helpers/PatternParser.java:500
#11 0x7fb37417eab4 in [compiled offset=0x114] 
org.apache.log4j.helpers.PatternConverter.format(java.lang.StringBuffer,org.apache.log4j.spi.LoggingEvent)
 ()
at org/apache/log4j/helpers/PatternConverter.java:65
#12 0x7fb37426315c in [inlined] java.lang.StringBuffer.setLength(int) () at 
java/lang/StringBuffer.java:193
0x7fb37426315c in [compiled offset=0x71c] 
org.apache.log4j.PatternLayout.format(org.apache.log4j.spi.LoggingEvent) () at 
org/apache/log4j/PatternLayout.java:503
#13 0x7fb37454484c in [compiled offset=0x12c] 
org.apache.log4j.WriterAppender.subAppend(org.apache.log4j.spi.LoggingEvent) () 
at org/apache/log4j/WriterAppender.java:310
#14 0x7fb374538aac in [compiled offset=0x1ec] 
org.apache.log4j.WriterAppender.append(org.apache.log4j.spi.LoggingEvent) () at 
org/apache/log4j/WriterAppender.java:160
#15 0x7fb37454793c in [compiled offset=0x113c] 
org.apache.log4j.AppenderSkeleton.doAppend(org.apache.log4j.spi.LoggingEvent) ()
at org/apache/log4j/AppenderSkeleton.java:251
#16 0x7fb374074204 in [compiled offset=0x4c4] 
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(org.apache.log4j.spi.LoggingEvent)
 ()
at org/apache/log4j/helpers/AppenderAttachableImpl.java:66
#17 0x7fb3742b5f24 in [compiled offset=0x1e4] 
org.apache.log4j.Category.callAppenders(org.apache.log4j.spi.LoggingEvent) () 
at org/apache/log4j/Category.java:200
#18 0x7fb374208d5c in [inlined] 
org.apache.log4j.Category.forcedLog(java.lang.String,org.apache.log4j.Priority,java.lang.Object,java.lang.Throwable)
 ()
at org/apache/log4j/Category.java:392
0x7fb374208d5c in [compiled offset=0x67c] 
org.apache.log4j.Category.log(java.lang.String,org.apache.log4j.Priority,java.lang.Object,java.lang.Throwable)
 ()
at org/apache/log4j/Category.java:858
#19 0x7fb37454b374 in [compiled offset=0x154] 
org.apache.commons.logging.impl.Log4JLogger.info(java.lang.Object) () at 
org/apache/commons/logging/impl/Log4JLogger.java:177
#20 0x7fb373cee042 in [interpreted: bc = 50] 

[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125871#comment-16125871
 ] 

Ted Yu commented on HBASE-18541:


{code}
#10 0x7ff89d9d1ffd in [interpreted: bc = 42] 
org.apache.zookeeper.ClientCnxn.submitRequest(org.apache.zookeeper.proto.RequestHeader,org.apache.jute.Record,org.apache.jute.Record,org.apache.zookeeper.ZooKeeper$WatchRegistration)
 () at org/apache/zookeeper/ClientCnxn.java:1408
{code}
The above corresponds with the wait() call in ClientCnxn#submitRequest:
{code}
synchronized (packet) {
while (!packet.finished) {
packet.wait();
}
}
{code}

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125850#comment-16125850
 ] 

Ted Yu commented on HBASE-18541:


Installed openjdk-8-dbg
When loading core dump in gdb, I got:
{code}
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by 
`/usr/src/hbase/hbase-native-client/buck-out/gen/core/retry-test 
--gtest_color=n'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f1a735338d5 in ?? ()
[Current thread is 1 (Thread 0x7f1a8701d840 (LWP 12922))]
Installing openjdk unwinder
(gdb) bt
#0  0x7f1a735338d5 in  ()
#1  0x7ffe78f88ba8 in  ()
#2  0x7f19e4d572c8 in  ()
#3  0x in  ()
{code}
There was no detail for the seg fault.

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124523#comment-16124523
 ] 

Ted Yu commented on HBASE-18541:


Managed to generate core dump where:
{code}
Core was generated by 
`/usr/src/hbase/hbase-native-client/buck-out/gen/core/retry-test 
--gtest_color=n'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7fe32bc33135 in ?? () from 
/usr/lib/jvm/java-8-openjdk-amd64//jre/lib/amd64/server/libjvm.so
[Current thread is 1 (Thread 0x7fe32c343840 (LWP 19436))]
(gdb) bt
#0  0x7fe32bc33135 in ?? () from 
/usr/lib/jvm/java-8-openjdk-amd64//jre/lib/amd64/server/libjvm.so
#1  0x7fe31885053e in ?? ()
#2  0xe0d55920 in ?? ()
#3  0xfefd3110 in ?? ()
#4  0x7ffdd7ce7210 in ?? ()
#5  0x7fe32b66dfed in ?? () from 
/usr/lib/jvm/java-8-openjdk-amd64//jre/lib/amd64/server/libjvm.so
#6  0x7fe318282b10 in ?? ()
#7  0x029426a0 in ?? ()
#8  0x7fe318282b10 in ?? ()
#9  0x0010 in ?? ()
#10 0x7fe32b4b32fd in ?? () from 
/usr/lib/jvm/java-8-openjdk-amd64//jre/lib/amd64/server/libjvm.so
#11 0x7fe318282b10 in ?? ()
#12 0x0010 in ?? ()
#13 0x7fe31840c5c8 in ?? ()
#14 0xfefd3110 in ?? ()
{code}
there was no method from native client shown above.

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-11 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124037#comment-16124037
 ] 

Ted Yu commented on HBASE-18541:


Looped the test 10 times which passed.

Previously I encountered deadlock which was resolved earlier today.

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122495#comment-16122495
 ] 

Ted Yu commented on HBASE-18541:


{code}
2017-08-10 22:32:16,664 INFO  
[RpcServer.FifoWFPBQ.default.handler=28,queue=1,port=38871] master.HMaster 
(HMaster.java:createTable(1541)) - proc Id 9
2017-08-10 22:32:16,666 INFO  
[RpcServer.FifoWFPBQ.default.handler=28,queue=1,port=38871] master.HMaster 
(HMaster.java:createTable(1543)) - back from latch
2017-08-10 22:32:16,667 INFO  [ProcessThread(sid:0 cport:54578):] 
server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(651)) - Got 
user-level KeeperException when processing sessionid:0x15dce45d47c 
type:create cxid:0xb6 zxid:0x5b txntype:-1 reqpath:n/a Error 
Path:/hbase/table-lock/table6 Error:KeeperErrorCode = NoNode for 
/hbase/table-lock/table6
{code}
Looks like the problem may have happened after table creation:
{code}
latch.await();
LOG.info("back from latch");
{code}

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122431#comment-16122431
 ] 

Ted Yu commented on HBASE-18541:


The crash was after this line in HMaster.java :
{code}
LOG.info(getClientIdAuditPrefix() + " create " + hTableDescriptor);
{code}
Added more log after the above and rerunning.

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122392#comment-16122392
 ] 

Ted Yu commented on HBASE-18541:


{code}
for i in `seq 1 10`; do buck test --no-results-cache core:multi-retry-test || 
break 1; done
{code}
The above command resulted in hanging test where I couldn't even use 'docker 
exec -it' command to get into docker VM.

Let me try more.

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-10 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122169#comment-16122169
 ] 

Enis Soztutar commented on HBASE-18541:
---

bq. The following command doesn't print stack trace when test fails:
When the program crashes, there is usually a core dump file. You can use gdb to 
get the stack trace. You can use something like: 
{code}
gdb ./buck-out/gen/core/multi-retry-test core.XXX
{code}

Then within gdb, you can use {{bt}} or {{thread apply all bt}} to get the 
backtrace. 

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122098#comment-16122098
 ] 

Ted Yu commented on HBASE-18541:


The following command doesn't print stack trace when test fails:
{code}
GLOG_logtostderr=1 ./buck-out/gen/core/multi-retry-test 
--gtest_filter=AsyncBatchRpcRetryTest.FailWithOperationTimeout
{code}

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-10 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122035#comment-16122035
 ] 

Enis Soztutar commented on HBASE-18541:
---

bq. core/async-batch-rpc-retrying-multi-region-test.cc appeared in stack 
trace.However, I don't find this file.
I had broken down the test into two, but that patch is not committed. You don't 
need to worry about that for now. Just running the multi-retry test in a loop 
reproduces the problem. You can install openjdk-8-dbg if you want to see the 
stack traces inside JVM. 
bq. Can we sync up HBASE-14850 branch with the master branch ?
We will resync sometime soon, because testing needs a more stable server-side 
that what the branch has as of now. For debugging this issue though, it should 
not be needed. 
bq. However, AsyncRpcRetryTest.TestFailWithOperationTimeout passes when run 
individually.
As per the description, the tests fail flakily, and is probably due to GC (not 
confirmed). Running the test in a loop like the one in the description 
reproduces the problem. 


> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121916#comment-16121916
 ] 

Ted Yu commented on HBASE-18541:


Sometimes the retry-test ended with:
{code}
[ RUN  ] AsyncRpcRetryTest.TestFailWithOperationTimeout
2017-08-10 17:01:54,177 INFO  
[RpcServer.FifoWFPBQ.default.handler=1,queue=1,port=41153] master.HMaster 
(HMaster.java:createTable(1530)) - Client=root//172.17.0.2 create 'table6', 
{NAME => 'd', BLOOMFILTER => 'NONE', VERSIONS => '1', IN_MEMORY => 'false', 
KEEP_DELETED_CELLS => 'FALSE', IN_MEMORY_COMPACTION => 'false', 
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', 
MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', 
REPLICATION_SCOPE => '0'}
2017-08-10 17:01:54,285 INFO  [ProcessThread(sid:0 cport:55375):] 
server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(651)) - Got 
user-level KeeperException when processing sessionid:0x15dcd181a4d 
type:create cxid:0xb5 zxid:0x5a txntype:-1 reqpath:n/a Error 
Path:/hbase/table-lock/table6 Error:KeeperErrorCode = NoNode for 
/hbase/table-lock/table6
2017-08-10 17:01:54,495 INFO  [RegionOpenAndInitThread-table6-1] 
regionserver.HRegion (HRegion.java:createHRegion(6282)) - creating HRegion 
table6 HTD == 'table6', {NAME => 'd', BLOOMFILTER => 'NONE', VERSIONS => '1', 
IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', IN_MEMORY_COMPACTION => 
'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 
'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', 
REPLICATION_SCOPE => '0'} RootDir = 
file:/usr/src/hbase/hbase-native-client/target/test-data/2db36bc6-e3b0-433a-b812-fc48cd42fd23/.tmp
 Table name == table6
2017-08-10 17:01:54,542 INFO  [RegionOpenAndInitThread-table6-1] 
regionserver.HRegion (HRegion.java:doClose(1590)) - Closed 
table6,,1502384514176.f9265e2eb45f3087f533c45ab1e5.
2017-08-10 17:01:54,653 INFO  [ProcedureExecutor-0] hbase.MetaTableAccessor 
(MetaTableAccessor.java:addRegionsToMeta(1571)) - Added 1
{code}
However, AsyncRpcRetryTest.TestFailWithOperationTimeout passes when run 
individually.

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121873#comment-16121873
 ] 

Ted Yu commented on HBASE-18541:


Running individual test gave me:
{code}
# GLOG_logtostderr=1 ./buck-out/gen/core/retry-test 
--gtest_filter=AsyncRpcRetryTest.TestHandleException
-nan
Result:
{code}

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121794#comment-16121794
 ] 

Ted Yu commented on HBASE-18541:


[~enis]:
core/async-batch-rpc-retrying-multi-region-test.cc appeared in stack trace.
However, I don't find this file.

Can you double check ?

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120683#comment-16120683
 ] 

Ted Yu commented on HBASE-18541:


The stack trace involved AsyncRpcClient.createRpcChannel().

Looking at the master branch, there isn't AsyncRpcClient.java.

Can we sync up HBASE-14850 branch with the master branch ?

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120556#comment-16120556
 ] 

Ted Yu commented on HBASE-18541:


I used the 'for' command which reproduced the test failure.
However, there was not much information in the test output:
{code}
-rw-r--r-- 1 root root 10 Aug  9 17:59 exitCode
-rw-r--r-- 1 root root 15 Aug  9 17:59 output
-rw-r--r-- 1 root root  0 Aug  9 17:59 results
{code}

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-09 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119550#comment-16119550
 ] 

Ted Yu commented on HBASE-18541:


>From the above stack trace, it seems the segfault came from 
>UserGroupInformation.hashCode().
>From UserGroupInformation :
{code}
public int hashCode() {
  return realUser.hashCode();
}
{code}
Going to find out how System.identityHashCode() comes into play.

> [C++] Segfaults from JNI
> 
>
> Key: HBASE-18541
> URL: https://issues.apache.org/jira/browse/HBASE-18541
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Ted Yu
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 
> 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the 
> create table method call. I was not able to inspect much, but the comments in 
> our mini-cluster indicate that we may need to use global references instead 
> of local ones. I suspect the problem happens when there is a GC run for the 
> test since the failure happens usually after some time (but almost always in 
> create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18541) [C++] Segfaults from JNI

2017-08-08 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118851#comment-16118851
 ] 

Enis Soztutar commented on HBASE-18541:
---

Just FYI, when I installed openjdk-8-dbg, I get this stack trace: 
{code}
#0  0x7fc70d8a3d18 in 
ThreadStateTransition::transition_from_native(JavaThread*, JavaThreadState) 
(thread=0x17e7000)
at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/os.hpp:419
#1  0x7fc70d8a3d18 in 
ThreadStateTransition::transition_from_native(JavaThread*, JavaThreadState) 
(thread=0x17e7000)
at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/os/linux/vm/interfaceSupport_linux.hpp:31
#2  0x7fc70d8a3d18 in 
ThreadStateTransition::transition_from_native(JavaThread*, JavaThreadState) 
(thread=thread@entry=0x17e7000, to=_thread_in_vm)
at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/interfaceSupport.hpp:212
#3  0x7fc70d8a5346 in JVM_IHashCode(JNIEnv*, jobject) (to=_thread_in_vm, 
this=)
at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/interfaceSupport.hpp:231
#4  0x7fc70d8a5346 in JVM_IHashCode(JNIEnv*, jobject) (thread=, this=)
at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/runtime/interfaceSupport.hpp:278
#5  0x7fc70d8a5346 in JVM_IHashCode(JNIEnv*, jobject) (env=, 
handle=0x7ffd1ded8970) at 
/build/openjdk-8-pZyJp3/openjdk-8-8u131-b11/src/hotspot/src/share/vm/prims/jvm.cpp:542
#6  0x7fc6fa61ebbf in [native offset=0xbf] 
java.lang.System.identityHashCode(java.lang.Object) () at java/lang/System.java
#7  0x7fc6fa27ba40 in [interpreted: bc = 4] 
org.apache.hadoop.security.UserGroupInformation.hashCode() () at 
org/apache/hadoop/security/UserGroupInformation.java:1616
#8  0x7fc6fa27ba40 in [interpreted: bc = 4] 
org.apache.hadoop.hbase.security.User.hashCode() () at 
org/apache/hadoop/hbase/security/User.java:152
#9  0x7fc6fa27ba40 in [interpreted: bc = 22] 
org.apache.hadoop.hbase.ipc.ConnectionId.hashCode(org.apache.hadoop.hbase.security.User,java.lang.String,java.net.InetSocketAddress)
 ()
at org/apache/hadoop/hbase/ipc/ConnectionId.java:79
#10 0x7fc6fa27ba40 in [interpreted: bc = 84] 
org.apache.hadoop.hbase.ipc.AsyncRpcClient.createRpcChannel(java.lang.String,java.net.InetSocketAddress,org.apache.hadoop.hbase.security.User)
 ()
at org/apache/hadoop/hbase/ipc/AsyncRpcClient.java:413
#11 0x7fc6fa27bd80 in [interpreted: bc = 24] 
org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(org.apache.hadoop.hbase.ipc.PayloadCarryingRpcController,com.google.protobuf.Descriptors$MethodDescriptor,com.google.protobuf.Message,com.google.protobuf.Message,org.apache.hadoop.hbase.security.User,java.net.InetSocketAddress,org.apache.hadoop.hbase.client.MetricsConnection$CallStats)
 ()
at org/apache/hadoop/hbase/ipc/AsyncRpcClient.java:243
#12 0x7fc6fa27bd80 in [interpreted: bc = 37] 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor,org.apache.hadoop.hbase.ipc.PayloadCarryingRpcController,com.google.protobuf.Message,com.google.protobuf.Message,org.apache.hadoop.hbase.security.User,java.net.InetSocketAddress)
 () at org/apache/hadoop/hbase/ipc/AbstractRpcClient.java:233
#13 0x7fc6fa27bd80 in [interpreted: bc = 28] 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor,com.google.protobuf.RpcController,com.google.protobuf.Message,com.google.protobuf.Message)
 () at org/apache/hadoop/hbase/ipc/AbstractRpcClient.java:354
#14 0x7fc6fa27be54 in [interpreted: bc = 24] 
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(com.google.protobuf.RpcController,org.apache.hadoop.hbase.protobuf.generated.MasterProtos$IsMasterRunningRequest)
 () at org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java:64354
#15 0x7fc6fa27be54 in [interpreted: bc = 8] 
org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceState.isMasterRunning()
 ()
at org/apache/hadoop/hbase/client/ConnectionImplementation.java:939
#16 0x7fc6fa27b7d0 in [interpreted: bc = 10] 
org.apache.hadoop.hbase.client.ConnectionImplementation.isKeepAliveMasterConnectedAndRunning(org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceState)
 () at org/apache/hadoop/hbase/client/ConnectionImplementation.java:1699
#17 0x7fc6fa27b7d0 in [interpreted: bc = 12] 
org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService()
 ()
at org/apache/hadoop/hbase/client/ConnectionImplementation.java:1287
#18 0x7fc6fa27be54 in [interpreted: bc = 5] 
org.apache.hadoop.hbase.client.MasterCallable.prepare(boolean) () at 
org/apache/hadoop/hbase/client/MasterCallable.java:39
#19 0x7fc6fa27c042