[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed'

2022-07-08 Thread Danny (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564493#comment-17564493 ] Danny commented on HBASE-26042: --- Is it valid?? > WAL lockup on 'sync failed' >

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed'

2022-03-12 Thread Viraj Jasani (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505232#comment-17505232 ] Viraj Jasani commented on HBASE-26042: -- Ack, thanks, will take a detailed look. [~mikegfink]

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed'

2022-03-11 Thread Andrew Kyle Purtell (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505167#comment-17505167 ] Andrew Kyle Purtell commented on HBASE-26042: - [~vjasani] You might be interested in the

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed'

2022-03-11 Thread Mike Fink (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505142#comment-17505142 ] Mike Fink commented on HBASE-26042: --- Thanks Andrew - hbase.wal.provider/meta_provider were on the

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed'

2022-03-11 Thread Andrew Kyle Purtell (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505129#comment-17505129 ] Andrew Kyle Purtell commented on HBASE-26042: - Thanks for attaching these resources. I will

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed'

2022-03-11 Thread Benoit Sigoure (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505128#comment-17505128 ] Benoit Sigoure commented on HBASE-26042: For some reason Mike can't upload files (maybe new

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed'

2022-03-11 Thread Mike Fink (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505127#comment-17505127 ] Mike Fink commented on HBASE-26042: --- Trying attaching one more time. > WAL lockup on 'sync failed' >

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed'

2022-03-11 Thread Mike Fink (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505125#comment-17505125 ] Mike Fink commented on HBASE-26042: --- Attaching a thread and heap dump from a similar system to the one

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed'

2022-03-11 Thread Benoit Sigoure (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505123#comment-17505123 ] Benoit Sigoure commented on HBASE-26042: Hi Andrew, thanks for your reply. I already attached

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed'

2022-03-10 Thread Andrew Kyle Purtell (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504586#comment-17504586 ] Andrew Kyle Purtell commented on HBASE-26042: - Apologies for the delayed response [~tsuna]

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2022-03-08 Thread Benoit Sigoure (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502810#comment-17502810 ] Benoit Sigoure commented on HBASE-26042: We've run into this issue on a test cluster with HBase

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-08-20 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402421#comment-17402421 ] Michael Stack commented on HBASE-26042: --- Update. We cured the provocation that was causing lots of

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-15 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381464#comment-17381464 ] Michael Stack commented on HBASE-26042: --- {quote}bq. if you have a heap dump from that state? Poke

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-15 Thread Bharath Vissapragada (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381451#comment-17381451 ] Bharath Vissapragada commented on HBASE-26042: -- Ya scratch that. Now that I look closely,

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-15 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381416#comment-17381416 ] Michael Stack commented on HBASE-26042: --- Thanks for taking a look. I don't have a test. I have

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-15 Thread Duo Zhang (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381146#comment-17381146 ] Duo Zhang commented on HBASE-26042: --- The UT itself breaks the rule. It runs flush in a background

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-15 Thread Duo Zhang (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381122#comment-17381122 ] Duo Zhang commented on HBASE-26042: --- Let me take a look. I guess the intention here is that

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-15 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381058#comment-17381058 ] Michael Stack commented on HBASE-26042: --- Played w/ [~bharathv] PR.  I can manufacture one of these

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-07 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376820#comment-17376820 ] Michael Stack commented on HBASE-26042: --- [~bharathv] let me try. Sweet. > WAL lockup on 'sync

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-07 Thread Bharath Vissapragada (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376818#comment-17376818 ] Bharath Vissapragada commented on HBASE-26042: -- bq. I think there is some racy code in

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-07 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376817#comment-17376817 ] Michael Stack commented on HBASE-26042: --- Tried w/ 2.3.5 and 2.4.3. The wal roll 'fixes' the

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-07 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376749#comment-17376749 ] Michael Stack commented on HBASE-26042: --- Reproduced by killing non-local DN: {code:java}

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-07 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376684#comment-17376684 ] Michael Stack commented on HBASE-26042: --- Tried reproducing {code:java} 2021-06-27 13:41:27,604

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-01 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373103#comment-17373103 ] Michael Stack commented on HBASE-26042: --- Some background notes: On this cluster, the hang is

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-01 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373028#comment-17373028 ] Michael Stack commented on HBASE-26042: --- [~bharathv] thanks for taking a look. I looked at that

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-07-01 Thread Bharath Vissapragada (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372926#comment-17372926 ] Bharath Vissapragada commented on HBASE-26042: -- Thanks for the jstacks, I think consume is

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-06-30 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372361#comment-17372361 ] Michael Stack commented on HBASE-26042: --- {quote}What is the ring buffer consume thread doing? If

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-06-30 Thread Duo Zhang (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372298#comment-17372298 ] Duo Zhang commented on HBASE-26042: --- What is the ring buffer consume thread doing? If it could write

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-06-30 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372295#comment-17372295 ] Michael Stack commented on HBASE-26042: --- Deadlock in coarse form is 116 of 200 handlers are in

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-06-30 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372272#comment-17372272 ] Michael Stack commented on HBASE-26042: --- Looking at thread dumps we seem to be stuck on the ring

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-06-30 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372109#comment-17372109 ] Michael Stack commented on HBASE-26042: --- [~apurtell] My attempt at repro did not pan out

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-06-30 Thread Andrew Kyle Purtell (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372108#comment-17372108 ] Andrew Kyle Purtell commented on HBASE-26042: - {quote}2021-06-29 20:33:35,183 WARN

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-06-30 Thread Bharath Vissapragada (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372080#comment-17372080 ] Bharath Vissapragada commented on HBASE-26042: -- Any chance you can attach a full jstack?

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-06-30 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372049#comment-17372049 ] Michael Stack commented on HBASE-26042: --- [~zhangduo] thanks for taking  look and 'different

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-06-29 Thread Duo Zhang (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371814#comment-17371814 ] Duo Zhang commented on HBASE-26042: --- {quote} Interesting is how more than one thread is able to be

[jira] [Commented] (HBASE-26042) WAL lockup on 'sync failed' org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

2021-06-29 Thread Michael Stack (Jira)
[ https://issues.apache.org/jira/browse/HBASE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371796#comment-17371796 ] Michael Stack commented on HBASE-26042: --- Here is what it looked like when I tried to repro the