Accumulo-Pull-Requests - Build # 399 - Aborted

2016-08-15 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #399)

Status: Aborted

Check console output at 
https://builds.apache.org/job/Accumulo-Pull-Requests/399/ to view the results.

[jira] [Resolved] (ACCUMULO-4406) .out and .err files should have same naming convention as log files

2016-08-15 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved ACCUMULO-4406.
--
Resolution: Fixed

Thanks for working with me on this, Dave. I think it's much better now than 
originally. Multiple tservers on a host worked how I expected it to, even with 
multiple hosts.

> .out and .err files should have same naming convention as log files
> ---
>
> Key: ACCUMULO-4406
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4406
> Project: Accumulo
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.8.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated ACCUMULO-4405:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Tested with {{verify -Psunny}} on 1.7 and 1.8.

Replaced the (intended) upper bound of 3s with a configuration property which 
has a default of 5s. I am not picky on these if they bother anyone.

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Affects Versions: 1.7.1, 1.7.2
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.7.3, 1.8.0
>
> Attachments: ACCUMULO-4405.001-1.7.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:150)
> at 
> 

[jira] [Updated] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated ACCUMULO-4405:
-
Fix Version/s: (was: 1.8.1)
   1.8.0

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Affects Versions: 1.7.1, 1.7.2
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.7.3, 1.8.0
>
> Attachments: ACCUMULO-4405.001-1.7.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:150)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.readRow(IsolatedScanner.java:70)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.(IsolatedScanner.java:149)
> at 
> 

[jira] [Commented] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421415#comment-15421415
 ] 

Josh Elser commented on ACCUMULO-4405:
--

Welp, I guess this got busted with the Java changes by infra. I'll have to 
revisit.

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Affects Versions: 1.7.1, 1.7.2
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.7.3, 1.8.1
>
> Attachments: ACCUMULO-4405.001-1.7.patch
>
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:150)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.readRow(IsolatedScanner.java:70)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.(IsolatedScanner.java:149)
> at 
> 

[jira] [Commented] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421409#comment-15421409
 ] 

Hadoop QA commented on ACCUMULO-4405:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} pre-patch {color} | {color:red} 0m 0s 
{color} | {color:red} JAVA_HOME is not defined. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12823725/ACCUMULO-4405.001-1.7.patch
 |
| JIRA Issue | ACCUMULO-4405 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  checkstyle  
compile  |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-ACCUMULO-Build/test_framework/yetus-0.3.0/lib/precommit/personality/accumulo.sh
 |
| git revision | 1.7 / 4885470 |
| Console output | 
https://builds.apache.org/job/PreCommit-ACCUMULO-Build/32/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Affects Versions: 1.7.1, 1.7.2
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.7.3, 1.8.1
>
> Attachments: ACCUMULO-4405.001-1.7.patch
>
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The 

[jira] [Updated] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated ACCUMULO-4405:
-
Attachment: ACCUMULO-4405.001-1.7.patch

.001 against 1.7. Ran {{verify -Psunny}} against 1.8, need to do the same for 
1.7.

Removed the constant 3s pause in favor of a Property (which defaults to 5s -- 
don't think there's any reason for the 3s chosen previously). Added some unit 
tests.

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Affects Versions: 1.7.1, 1.7.2
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.7.3, 1.8.1
>
> Attachments: ACCUMULO-4405.001-1.7.patch
>
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:150)
> at 
> 

[jira] [Updated] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated ACCUMULO-4405:
-
Status: Patch Available  (was: Open)

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Affects Versions: 1.7.2, 1.7.1
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.7.3, 1.8.1
>
> Attachments: ACCUMULO-4405.001-1.7.patch
>
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:150)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.readRow(IsolatedScanner.java:70)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.(IsolatedScanner.java:149)
> at 
> org.apache.accumulo.core.client.IsolatedScanner.iterator(IsolatedScanner.java:238)
> 

[jira] [Commented] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421349#comment-15421349
 ] 

Josh Elser commented on ACCUMULO-4405:
--

Yep. I added a property to control this in the future and some tests. Running 
through {{verify -Psunny}} locally then will open up a PR.

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Affects Versions: 1.7.1, 1.7.2
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.7.3, 1.8.1
>
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:150)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.readRow(IsolatedScanner.java:70)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.(IsolatedScanner.java:149)
> at 
> 

[jira] [Commented] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421278#comment-15421278
 ] 

Christopher Tubbs commented on ACCUMULO-4405:
-

That should be {{Math.min}}.

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Affects Versions: 1.7.1, 1.7.2
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.7.3, 1.8.1
>
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:150)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.readRow(IsolatedScanner.java:70)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.(IsolatedScanner.java:149)
> at 
> org.apache.accumulo.core.client.IsolatedScanner.iterator(IsolatedScanner.java:238)
> at 
> 

[jira] [Updated] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated ACCUMULO-4405:
-
Affects Version/s: 1.7.1
   1.7.2

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Affects Versions: 1.7.1, 1.7.2
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.7.3, 1.8.1
>
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:150)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.readRow(IsolatedScanner.java:70)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.(IsolatedScanner.java:149)
> at 
> org.apache.accumulo.core.client.IsolatedScanner.iterator(IsolatedScanner.java:238)
> at 
> 

[jira] [Commented] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421258#comment-15421258
 ] 

Josh Elser commented on ACCUMULO-4405:
--

Looks like this came in via ACCUMULO-3683 in 1.7.0

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.7.3, 1.8.1
>
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:150)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.readRow(IsolatedScanner.java:70)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.(IsolatedScanner.java:149)
> at 
> org.apache.accumulo.core.client.IsolatedScanner.iterator(IsolatedScanner.java:238)
> at 
> 

[jira] [Updated] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated ACCUMULO-4405:
-
Fix Version/s: 1.7.3

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.7.3, 1.8.1
>
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:150)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.readRow(IsolatedScanner.java:70)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.(IsolatedScanner.java:149)
> at 
> org.apache.accumulo.core.client.IsolatedScanner.iterator(IsolatedScanner.java:238)
> at 
> org.apache.accumulo.core.client.RowIterator.(RowIterator.java:117)
> at 
> 

[jira] [Commented] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421251#comment-15421251
 ] 

Josh Elser commented on ACCUMULO-4405:
--

Oh dear, this is worrisome. The comment on how long ThriftScanner pauses is not 
accurate at all:

{code}
  private static long pause(long millis) throws InterruptedException {
Thread.sleep(millis);
// wait 2 * last time, with +-10% random jitter
return (long) (Math.max(millis * 2, 3000) * (.9 + Math.random() / 5));
}
{code}

Ok, this is really nasty. Instead of doing a very quick retry as the method 
would imply (100ms, then 200ms), we wait 100ms, then immediately jump up to 
3000ms +/-10%.

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.8.1
>
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
> at 
> 

[jira] [Created] (ACCUMULO-4406) .out and .err files should have same naming convention as log files

2016-08-15 Thread Dave Marion (JIRA)
Dave Marion created ACCUMULO-4406:
-

 Summary: .out and .err files should have same naming convention as 
log files
 Key: ACCUMULO-4406
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4406
 Project: Accumulo
  Issue Type: Improvement
  Components: scripts
Reporter: Dave Marion
Assignee: Dave Marion
 Fix For: 1.8.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421226#comment-15421226
 ] 

Josh Elser commented on ACCUMULO-4405:
--

bq. I also have to question why the RPC itself from the GC did not time out. I 
would have suspected that it would be using the TTimeoutTransport (but perhaps 
it is also configured to have an extremely long timeout).

Apparently, RPCs to the Master intentionally do not use the TTimeoutTransport:

{code}
try {
  // Master requests can take a long time: don't ever time out
  MasterClientService.Client client = ThriftUtil.getClientNoTimeout(new 
MasterClientService.Client.Factory(), master, context);
  return client;
} catch (TTransportException tte) {
  Throwable cause = tte.getCause();
  if (null != cause && cause instanceof UnknownHostException) {
// do not expect to recover from this
throw new RuntimeException(tte);
  }
  log.debug("Failed to connect to master=" + master + ", will retry... ", 
tte);
  return null;
}
{code}

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.8.1
>
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING 

[jira] [Commented] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421179#comment-15421179
 ] 

Josh Elser commented on ACCUMULO-4405:
--

I also have to question why the RPC itself from the GC did not time out. I 
would have suspected that it would be using the TTimeoutTransport (but perhaps 
it is also configured to have an extremely long timeout).

> GC collection cycle stuck on waitForFlush RPC to Master
> ---
>
> Key: ACCUMULO-4405
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
> Project: Accumulo
>  Issue Type: Bug
>  Components: gc, master
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.8.1
>
>
> While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
> OOME, I believe, because I temporarily ran out of HDFS space because HDFS 
> trash was enabled (trash could not be cleaned up fast enough for Accumulo 
> generating more trash). I came back to the system after restarting the 
> TabletServers and found that the GC had not run a new cycle after restarting 
> the TabletServers. In a jstack of the GC, I saw:
> {noformat}
> "gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
> [0x7f6f1ebc]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> - locked <0xf5b4b750> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at 
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
> at 
> org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
> at 
> org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
> at 
> org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
> at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
> at org.apache.accumulo.start.Main$1.run(Main.java:120)
> at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
> - None
> {noformat}
> The Master was also stuck with an active thread/RPC:
> {noformat}
> "ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
> nid=0x6401 waiting on condition [0x7f8462d94000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
> at 
> org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
> at 
> org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:150)
> at 
> org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.readRow(IsolatedScanner.java:70)
> at 
> 

[jira] [Created] (ACCUMULO-4405) GC collection cycle stuck on waitForFlush RPC to Master

2016-08-15 Thread Josh Elser (JIRA)
Josh Elser created ACCUMULO-4405:


 Summary: GC collection cycle stuck on waitForFlush RPC to Master
 Key: ACCUMULO-4405
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4405
 Project: Accumulo
  Issue Type: Bug
  Components: gc, master
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 1.8.1


While testing out 1.8.0rc1, all of the TabletServers had crashed due to an 
OOME, I believe, because I temporarily ran out of HDFS space because HDFS trash 
was enabled (trash could not be cleaned up fast enough for Accumulo generating 
more trash). I came back to the system after restarting the TabletServers and 
found that the GC had not run a new cycle after restarting the TabletServers. 
In a jstack of the GC, I saw:

{noformat}
"gc" #13 prio=5 os_prio=0 tid=0x021f3800 nid=0x4dd5 runnable 
[0x7f6f1ebc]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0xf5b4b750> (a java.io.BufferedInputStream)
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at 
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at 
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at 
org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
at 
org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
at 
org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:501)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at 
org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForFlush(MasterClientService.java:209)
at 
org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForFlush(MasterClientService.java:190)
at 
org.apache.accumulo.core.client.impl.TableOperationsImpl._flush(TableOperationsImpl.java:820)
at 
org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:758)
at 
org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:727)
at 
org.apache.accumulo.core.client.impl.TableOperationsImpl.compact(TableOperationsImpl.java:721)
at 
org.apache.accumulo.gc.SimpleGarbageCollector.run(SimpleGarbageCollector.java:592)
at 
org.apache.accumulo.gc.SimpleGarbageCollector.main(SimpleGarbageCollector.java:160)
at org.apache.accumulo.gc.GCExecutable.execute(GCExecutable.java:34)
at org.apache.accumulo.start.Main$1.run(Main.java:120)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- None
{noformat}

The Master was also stuck with an active thread/RPC:

{noformat}
"ClientPool 23257" #45412 daemon prio=5 os_prio=0 tid=0x049cc000 
nid=0x6401 waiting on condition [0x7f8462d94000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.accumulo.core.client.impl.ThriftScanner.pause(ThriftScanner.java:211)
at 
org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:259)
at 
org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:79)
at 
org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:150)
at 
org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.readRow(IsolatedScanner.java:70)
at 
org.apache.accumulo.core.client.IsolatedScanner$RowBufferingIterator.(IsolatedScanner.java:149)
at 
org.apache.accumulo.core.client.IsolatedScanner.iterator(IsolatedScanner.java:238)
at 
org.apache.accumulo.core.client.RowIterator.(RowIterator.java:117)
at 
org.apache.accumulo.master.MasterClientServiceHandler.waitForFlush(MasterClientServiceHandler.java:188)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at