[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-09 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431054#comment-16431054
 ] 

Mike Drob commented on HBASE-20322:
---

Thanks!

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-09 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431034#comment-16431034
 ] 

Thiruvel Thirumoolan commented on HBASE-20322:
--

[~mdrob] - I raised HBASE-20373 to confirm that and fix it. Will get to it 
sometime this week or next.

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-06 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429218#comment-16429218
 ] 

Mike Drob commented on HBASE-20322:
---

[~apurtell] [~thiruvel] - did we ever decide on an answer to this question -

bq. Do we need a followup for branch-2.0, branch-2, and master? Is this also an 
issue there?

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426308#comment-16426308
 ] 

Hudson commented on HBASE-20322:


Results for branch branch-1.3
[build #284 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/284/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/284//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/284//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.3/284//JDK8_Nightly_Build_Report_(Hadoop2)/]




(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426119#comment-16426119
 ] 

Hudson commented on HBASE-20322:


Results for branch branch-1.4
[build #276 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/276/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/276//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/276//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/276//JDK8_Nightly_Build_Report_(Hadoop2)/]




(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425349#comment-16425349
 ] 

Hudson commented on HBASE-20322:


Results for branch branch-1
[build #270 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/270/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/270//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/270//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/270//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424597#comment-16424597
 ] 

Hudson commented on HBASE-20322:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #388 (See 
[https://builds.apache.org/job/HBase-1.3-IT/388/])
Amend HBASE-20322 CME in StoreScanner causes region server crash (apurtell: rev 
0db4bd3aa71ecd7770430c428cf01d238ec06fa7)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java


> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424579#comment-16424579
 ] 

Andrew Purtell commented on HBASE-20322:


Pushed the addendum

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424487#comment-16424487
 ] 

Andrew Purtell commented on HBASE-20322:


Addendum is fine. I ran TestAtomicOperation in a loop 100 times and it did not 
fail. Glad you could repro it. 

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-03 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424475#comment-16424475
 ] 

Thiruvel Thirumoolan commented on HBASE-20322:
--

[~apurtell] - There is a small bug in patch which causes TestAtomicOperation to 
fail. I see the bug at-least once when I run TestAtomicOperation 25 times. Pls 
let me know if you would like me to follow up on a separate Jira. I have 
attached the addendum along.

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002-addendum.patch, HBASE-20322.branch-1.3.002.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424449#comment-16424449
 ] 

Hudson commented on HBASE-20322:


SUCCESS: Integrated in Jenkins build HBase-1.3-IT #387 (See 
[https://builds.apache.org/job/HBase-1.3-IT/387/])
HBASE-20322 CME in StoreScanner causes region server crash (apurtell: rev 
9eaafe1ee91bc12d8933e17ad6cab7af8251c0f5)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java


> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002.patch, HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-02 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423115#comment-16423115
 ] 

Andrew Purtell commented on HBASE-20322:


+1

Will commit

Looking at this first though:
{quote}
|Failed junit tests|hadoop.hbase.regionserver.TestAtomicOperation|
{quote}

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002.patch, HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423038#comment-16423038
 ] 

Hadoop QA commented on HBASE-20322:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-1.3 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
45s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_163 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_171 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
21s{color} | {color:green} branch-1.3 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
49s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} branch-1.3 passed with JDK v1.8.0_163 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} branch-1.3 passed with JDK v1.7.0_171 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed with JDK v1.8.0_163 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK v1.7.0_171 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} hbase-server: The patch generated 0 new + 60 
unchanged - 2 fixed = 60 total (was 62) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
22s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 23s{color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 
2.5.2 2.6.5 2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed with JDK v1.8.0_163 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.7.0_171 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 40s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}110m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestAtomicOperation |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:dca6535 |
| JIRA Issue | HBASE-20322 |
| JIRA Patch URL | 

[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-04-02 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422869#comment-16422869
 ] 

Thiruvel Thirumoolan commented on HBASE-20322:
--

Thanks [~apurtell] [~yuzhih...@gmail.com] for reviews.

[~apurtell] - I responded to your question, pls let me know if it looks ok.

 

I have re-uploaded same patch for precommit to run, last failure wasn't related 
to the patch.

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.3.002.patch, HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-03-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421029#comment-16421029
 ] 

Hadoop QA commented on HBASE-20322:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-1.4 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
51s{color} | {color:green} branch-1.4 passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
19s{color} | {color:red} hbase-server in branch-1.4 failed with JDK v1.8.0_163. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
16s{color} | {color:red} hbase-server in branch-1.4 failed with JDK v1.7.0_171. 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
21s{color} | {color:green} branch-1.4 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
59s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} branch-1.4 passed with JDK v1.8.0_163 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} branch-1.4 passed with JDK v1.7.0_171 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
13s{color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_163. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 13s{color} 
| {color:red} hbase-server in the patch failed with JDK v1.8.0_163. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
16s{color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_171. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 16s{color} 
| {color:red} hbase-server in the patch failed with JDK v1.7.0_171. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} hbase-server: The patch generated 0 new + 39 
unchanged - 2 fixed = 39 total (was 41) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
33s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 37s{color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 
2.5.2 2.6.5 2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed with JDK v1.8.0_163 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed with JDK v1.7.0_171 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}101m  8s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestAtomicOperation |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:74e3133 |
| JIRA Issue | HBASE-20322 

[jira] [Commented] (HBASE-20322) CME in StoreScanner causes region server crash

2018-03-30 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420930#comment-16420930
 ] 

Andrew Purtell commented on HBASE-20322:


changes basically lgtm, but i left one question on the review for 1.3. i don't 
think we need a separate review for 1.4

> CME in StoreScanner causes region server crash
> --
>
> Key: HBASE-20322
> URL: https://issues.apache.org/jira/browse/HBASE-20322
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.3.3, 1.4.4
>
> Attachments: HBASE-20322.branch-1.3.001.patch, 
> HBASE-20322.branch-1.4.001.patch
>
>
> RS crashed with ConcurrentModificationException on our 1.3 cluster, stack 
> trace below. [~toffer] and I checked and there is a race condition between 
> flush and scanner close. When StoreScanner.updateReaders() is updating the 
> scanners after a newly flushed file (in this trace below a region close 
> during a split), the client's scanner could be closing thus causing CME.
> Its rare, but since it crashes the region server, needs to be fixed.
> FATAL regionserver.HRegionServer [regionserver/] : ABORTING region server 
> : Replay of WAL required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2579)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2255)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2217)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2207)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1501)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1420)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:398)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:278)
> at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:566)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
> at java.util.ArrayList$Itr.next(ArrayList.java:851)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.clearAndClose(StoreScanner.java:797)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:825)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1155)
> PS: ignore the line no in the above stack trace, method calls should help 
> understand whats happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)