[GitHub] milleruntime commented on a change in pull request #352: ACCUMULO-4771 Implement DataTables in Monitor

2018-01-08 Thread GitBox
milleruntime commented on a change in pull request #352: ACCUMULO-4771 
Implement DataTables in Monitor
URL: https://github.com/apache/accumulo/pull/352#discussion_r160254342
 
 

 ##
 File path: 
server/monitor/src/main/resources/org/apache/accumulo/monitor/resources/external/datatables/css/jquery.dataTables.css
 ##
 @@ -0,0 +1,448 @@
+/*
+ * Table styles
 
 Review comment:
   It is vanilla, downloaded as-is from the site. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ctubbsii commented on a change in pull request #352: ACCUMULO-4771 Implement DataTables in Monitor

2018-01-08 Thread GitBox
ctubbsii commented on a change in pull request #352: ACCUMULO-4771 Implement 
DataTables in Monitor
URL: https://github.com/apache/accumulo/pull/352#discussion_r160242664
 
 

 ##
 File path: 
server/monitor/src/main/resources/org/apache/accumulo/monitor/resources/external/datatables/css/jquery.dataTables.css
 ##
 @@ -0,0 +1,448 @@
+/*
+ * Table styles
 
 Review comment:
   Is this vanilla, or did you have to tweak it? Things in the external folder 
should correspond to an upstream, so I'm just checking. If we have to tweak it, 
we can tweak it in our own CSS file or inline.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ctubbsii commented on a change in pull request #352: ACCUMULO-4771 Implement DataTables in Monitor

2018-01-08 Thread GitBox
ctubbsii commented on a change in pull request #352: ACCUMULO-4771 Implement 
DataTables in Monitor
URL: https://github.com/apache/accumulo/pull/352#discussion_r160203928
 
 

 ##
 File path: 
server/monitor/src/main/java/org/apache/accumulo/monitor/rest/tables/TableInformationList.java
 ##
 @@ -21,15 +21,15 @@
 
 /**
  *
- * Generates a list with table information
+ * Generates a list with table information. This is mainly used to populate 
DataTables
 
 Review comment:
   Not sure this comment will make sense in the future, because somebody 
editing this might not understand what is meant by "populate DataTables". Also, 
it's probably not needed to explain prescriptively how an API is used, as that 
can change over time. It's better to focus comments on what it provides, rather 
than what calls it.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkrdev commented on issue #11: ACCUMULO-4749 WIP Bulk Loading Test

2018-01-08 Thread GitBox
jkrdev commented on issue #11: ACCUMULO-4749 WIP Bulk Loading Test
URL: https://github.com/apache/accumulo-testing/pull/11#issuecomment-356047348
 
 
   My bad I deleted it because Ben told me where to find it. I also realized 
the Travis build only failed because it couldn't connect not something on my 
end. Though should be noted that it will build cleanly but it not actually 
done. Would love a review though if possible. At the moment I am working on the 
generator class need to figure out how to have multiple generators that create 
the data (numbers) and the link list and put then in a directory, the loader 
class can then use that directory to make rfiles and the rest of the steps. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (ACCUMULO-4770) Accumulo monitor overview page is not listing all Zookeeper nodes

2018-01-08 Thread Christopher Tubbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs updated ACCUMULO-4770:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Accumulo monitor overview page is not listing all Zookeeper nodes
> -
>
> Key: ACCUMULO-4770
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4770
> Project: Accumulo
>  Issue Type: Bug
>  Components: monitor
>Affects Versions: 2.0.0
> Environment: Ran Accumulo 2.0.0-SNAPSHOT on a cluster with 3 
> Zookeeper nodes.  Only one is being listed in the Accumulo monitor overview 
> page.
>Reporter: Mike Walch
>Assignee: Christopher Tubbs
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] milleruntime commented on issue #353: ACCUMULO-4770 Show all ZooKeeper nodes on monitor

2018-01-08 Thread GitBox
milleruntime commented on issue #353: ACCUMULO-4770 Show all ZooKeeper nodes on 
monitor
URL: https://github.com/apache/accumulo/pull/353#issuecomment-356046535
 
 
   Agreed. Sounds good


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] asfgit closed pull request #353: ACCUMULO-4770 Show all ZooKeeper nodes on monitor

2018-01-08 Thread GitBox
asfgit closed pull request #353: ACCUMULO-4770 Show all ZooKeeper nodes on 
monitor
URL: https://github.com/apache/accumulo/pull/353
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/server/monitor/src/main/resources/org/apache/accumulo/monitor/resources/js/overview.js
 
b/server/monitor/src/main/resources/org/apache/accumulo/monitor/resources/js/overview.js
index c9796c9d8e..cebecf8a16 100644
--- 
a/server/monitor/src/main/resources/org/apache/accumulo/monitor/resources/js/overview.js
+++ 
b/server/monitor/src/main/resources/org/apache/accumulo/monitor/resources/js/overview.js
@@ -96,21 +96,18 @@ function refreshZKTable() {
   if (data.length === 0 || data.zkServers.length === 0) {
 $('#zookeeper tr td:first').show();
   } else {
-var items = [];
 $.each(data.zkServers, function(key, val) {
+  var cells = '' + val.server + '';
   if (val.clients >= 0) {
-items.push('' + val.server + '');
-items.push('' + val.mode + '');
-items.push('' + val.clients + '');
+cells += '' + val.mode + '';
+cells += '' + val.clients + '';
   } else {
-items.push('' + val.server + '');
-items.push('Down');
-items.push('');
+cells += 'Down';
+cells += '';
   }
+  // create a  element with html containing the cell data; append it 
to the table
+  $('', { html: cells }).appendTo('#zookeeper table');
 });
-$('', {
-  html: items.join('')
-}).appendTo('#zookeeper table');
   }
 }
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ctubbsii commented on issue #353: ACCUMULO-4770 Show all ZooKeeper nodes on monitor

2018-01-08 Thread GitBox
ctubbsii commented on issue #353: ACCUMULO-4770 Show all ZooKeeper nodes on 
monitor
URL: https://github.com/apache/accumulo/pull/353#issuecomment-356046153
 
 
   @milleruntime I'll merge for now. It might make sense to turn this into a 
DataTables table, or do some other simplification in the template. I'll leave 
that to future work. For now, this will wrap up this bug.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (ACCUMULO-4777) Root tablet got spammed with 1.8 million log entries

2018-01-08 Thread Christopher Tubbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs updated ACCUMULO-4777:

Fix Version/s: 2.0.0
   1.8.2

> Root tablet got spammed with 1.8 million log entries
> 
>
> Key: ACCUMULO-4777
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4777
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Ivan Bella
>Priority: Critical
> Fix For: 1.8.2, 2.0.0
>
>
> We had a tserver that was handling accumulo.metadata tablets that somehow got 
> into a loop where it created over 22K empty wal logs.  There were around 70 
> metadata tablets and this resulted in around 1.8 million log entries in added 
> to the accumulo.root table.  The only reason it stopped creating wal logs is 
> because it ran out of open file handles.  This took us many hours and cups of 
> coffee to clean up.
> The log contained the following messages in a tight loop:
> log.TabletServerLogger INFO : Using next log hdfs://...
> tserver.TabletServfer INFO : Writing log marker for hdfs://...
> tserver.TabletServer INFO : Marking hdfs://... closed
> log.DfsLogger INFO : Slow sync cost ...
> ...
> Unfortunately we did not have DEBUG turned on so we have no debug messages.
> Tracking through the code there are three places where the 
> TabletServerLogger.close method is called:
> 1) via resetLoggers in the TabletServerLogger, but nothing calls this method 
> so this is ruled out
> 2) when the log gets too large or too old, but neither of those checks should 
> have been hitting here.
> 3) In a loop that is executed (while (!success)) in the 
> TabletServerLogger.write method.  In this case when we unsuccessfullty write 
> something to the wal, then that one is closed and a new one is created.  This 
> loop will go forever until we successfully write out the entry.  A 
> DfsLogger.LogClosedException seems the most logical reason.  This is most 
> likely because a ClosedChannelException was thrown from the DfsLogger.write 
> methods (around line 609 in DfsLogger).
> So the root cause was most likely hadoop related.  However in accumulo we 
> probably should not be doing a tight retry loop around a hadoop failure.  I 
> recommend at a minimum doing some sort of exponential back off and perhaps 
> setting a limit on the number of retries resulting in a critical tserver 
> failure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] ctubbsii commented on issue #11: ACCUMULO-4749 WIP Bulk Loading Test

2018-01-08 Thread GitBox
ctubbsii commented on issue #11: ACCUMULO-4749 WIP Bulk Loading Test
URL: https://github.com/apache/accumulo-testing/pull/11#issuecomment-356044890
 
 
   @jkrdev Saw you asked, but can't find it on GitHub now; maybe you already 
figured it out and deleted your question, but in case you're still wondering, 
the command Travis CI uses to build the Maven project can be found in the 
`.travis.yml` file. Currently, it is `mvn clean verify`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (ACCUMULO-4760) NPE on server side getting replication information for monitor

2018-01-08 Thread Christopher Tubbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs resolved ACCUMULO-4760.
-
Resolution: Fixed

> NPE on server side getting replication information for monitor
> --
>
> Key: ACCUMULO-4760
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4760
> Project: Accumulo
>  Issue Type: Bug
>  Components: monitor
>Reporter: Christopher Tubbs
>Assignee: Michael Miller
>  Labels: newbie
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Without actually setting up replication, I tried looking at 
> http://localhost:9995/replication in the monitor with the replication table 
> both online and offline. Both seemed to have the same problem, resulting in a 
> server-side HTTP 500 error on http://localhost:9995/rest/replication
> {code:java}
> 017-12-08 14:34:21,661 [servlet.ServletHandler] WARN : 
> javax.servlet.ServletException: java.lang.NullPointerException
>   at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
>   at 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.accumulo.core.client.impl.Tables.getZooCache(Tables.java:52)
>   at 
> org.apache.accumulo.core.client.impl.Tables.getAllTables(Tables.java:238)
>   at 
> org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:279)
>   at 
> org.apache.accumulo.monitor.rest.replication.ReplicationResource.getReplicationInformation(ReplicationResource.java:114)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>   at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>   at 
> 

[jira] [Commented] (ACCUMULO-4760) NPE on server side getting replication information for monitor

2018-01-08 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316676#comment-16316676
 ] 

Christopher Tubbs commented on ACCUMULO-4760:
-

Ah. Gotcha. Then, closing this.

> NPE on server side getting replication information for monitor
> --
>
> Key: ACCUMULO-4760
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4760
> Project: Accumulo
>  Issue Type: Bug
>  Components: monitor
>Reporter: Christopher Tubbs
>Assignee: Michael Miller
>  Labels: newbie
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Without actually setting up replication, I tried looking at 
> http://localhost:9995/replication in the monitor with the replication table 
> both online and offline. Both seemed to have the same problem, resulting in a 
> server-side HTTP 500 error on http://localhost:9995/rest/replication
> {code:java}
> 017-12-08 14:34:21,661 [servlet.ServletHandler] WARN : 
> javax.servlet.ServletException: java.lang.NullPointerException
>   at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
>   at 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.accumulo.core.client.impl.Tables.getZooCache(Tables.java:52)
>   at 
> org.apache.accumulo.core.client.impl.Tables.getAllTables(Tables.java:238)
>   at 
> org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:279)
>   at 
> org.apache.accumulo.monitor.rest.replication.ReplicationResource.getReplicationInformation(ReplicationResource.java:114)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>   at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>   at 
> 

[jira] [Commented] (ACCUMULO-4760) NPE on server side getting replication information for monitor

2018-01-08 Thread Michael Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316621#comment-16316621
 ] 

Michael Miller commented on ACCUMULO-4760:
--

No PR since it was pretty clear once I looked into it. 
https://github.com/apache/accumulo/commit/208cfeaf57e910683cd0215ea2f5487a98dab22b

> NPE on server side getting replication information for monitor
> --
>
> Key: ACCUMULO-4760
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4760
> Project: Accumulo
>  Issue Type: Bug
>  Components: monitor
>Reporter: Christopher Tubbs
>Assignee: Michael Miller
>  Labels: newbie
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Without actually setting up replication, I tried looking at 
> http://localhost:9995/replication in the monitor with the replication table 
> both online and offline. Both seemed to have the same problem, resulting in a 
> server-side HTTP 500 error on http://localhost:9995/rest/replication
> {code:java}
> 017-12-08 14:34:21,661 [servlet.ServletHandler] WARN : 
> javax.servlet.ServletException: java.lang.NullPointerException
>   at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
>   at 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.accumulo.core.client.impl.Tables.getZooCache(Tables.java:52)
>   at 
> org.apache.accumulo.core.client.impl.Tables.getAllTables(Tables.java:238)
>   at 
> org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:279)
>   at 
> org.apache.accumulo.monitor.rest.replication.ReplicationResource.getReplicationInformation(ReplicationResource.java:114)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>   at 
> 

[jira] [Commented] (ACCUMULO-4760) NPE on server side getting replication information for monitor

2018-01-08 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316614#comment-16316614
 ] 

Christopher Tubbs commented on ACCUMULO-4760:
-

[~milleruntime], to answer your question, I used the shell to make the table 
online. You reference a "fix", but I don't see a PR or patch associated with 
this issue? Did you do a PR referencing the wrong issue number?

> NPE on server side getting replication information for monitor
> --
>
> Key: ACCUMULO-4760
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4760
> Project: Accumulo
>  Issue Type: Bug
>  Components: monitor
>Reporter: Christopher Tubbs
>Assignee: Michael Miller
>  Labels: newbie
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Without actually setting up replication, I tried looking at 
> http://localhost:9995/replication in the monitor with the replication table 
> both online and offline. Both seemed to have the same problem, resulting in a 
> server-side HTTP 500 error on http://localhost:9995/rest/replication
> {code:java}
> 017-12-08 14:34:21,661 [servlet.ServletHandler] WARN : 
> javax.servlet.ServletException: java.lang.NullPointerException
>   at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
>   at 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.accumulo.core.client.impl.Tables.getZooCache(Tables.java:52)
>   at 
> org.apache.accumulo.core.client.impl.Tables.getAllTables(Tables.java:238)
>   at 
> org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:279)
>   at 
> org.apache.accumulo.monitor.rest.replication.ReplicationResource.getReplicationInformation(ReplicationResource.java:114)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>   at 
> 

[jira] [Commented] (ACCUMULO-4760) NPE on server side getting replication information for monitor

2018-01-08 Thread Michael Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316543#comment-16316543
 ] 

Michael Miller commented on ACCUMULO-4760:
--

Well I as unable to test Replication across 2 instances but the NPE is gone 
since my fix.  I think this issue could be closed.

> NPE on server side getting replication information for monitor
> --
>
> Key: ACCUMULO-4760
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4760
> Project: Accumulo
>  Issue Type: Bug
>  Components: monitor
>Reporter: Christopher Tubbs
>Assignee: Michael Miller
>  Labels: newbie
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Without actually setting up replication, I tried looking at 
> http://localhost:9995/replication in the monitor with the replication table 
> both online and offline. Both seemed to have the same problem, resulting in a 
> server-side HTTP 500 error on http://localhost:9995/rest/replication
> {code:java}
> 017-12-08 14:34:21,661 [servlet.ServletHandler] WARN : 
> javax.servlet.ServletException: java.lang.NullPointerException
>   at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
>   at 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
>   at 
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:534)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.accumulo.core.client.impl.Tables.getZooCache(Tables.java:52)
>   at 
> org.apache.accumulo.core.client.impl.Tables.getAllTables(Tables.java:238)
>   at 
> org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:279)
>   at 
> org.apache.accumulo.monitor.rest.replication.ReplicationResource.getReplicationInformation(ReplicationResource.java:114)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>   at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>   

[GitHub] jkrdev commented on issue #11: ACCUMULO-4749 WIP Bulk Loading Test

2018-01-08 Thread GitBox
jkrdev commented on issue #11: ACCUMULO-4749 WIP Bulk Loading Test
URL: https://github.com/apache/accumulo-testing/pull/11#issuecomment-356002979
 
 
   Does anyone know what Maven command travis runs for this? I ran `mvn clean 
verify -DskipITs` and it passed locally but since it's failing here maybe that 
isn't the command being run? I want to be able to test locally without having 
to push to the repo every time.
 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkrdev commented on issue #11: ACCUMULO-4749 WIP Bulk Loading Test

2018-01-08 Thread GitBox
jkrdev commented on issue #11: ACCUMULO-4749 WIP Bulk Loading Test
URL: https://github.com/apache/accumulo-testing/pull/11#issuecomment-356002979
 
 
   Does anyone know what Maven command travis runs for this? I ran `mvn clean 
verify -DskipITs` and it passed locally but since it's failing here maybe that 
isn't the command being run? I want to be able to test locally without having 
to push to the repo every time.
 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkrdev commented on issue #11: ACCUMULO-4749 WIP Bulk Loading Test

2018-01-08 Thread GitBox
jkrdev commented on issue #11: ACCUMULO-4749 WIP Bulk Loading Test
URL: https://github.com/apache/accumulo-testing/pull/11#issuecomment-356002979
 
 
   Does anyone know what Maven command travis runs for this? I ran `mvn clean 
verify -DskipITs` and it passed locally but since it's failing here maybe that 
isn't the command being run?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (ACCUMULO-4777) Root tablet got spammed with 1.8 million log entries

2018-01-08 Thread Ivan Bella (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316356#comment-16316356
 ] 

Ivan Bella edited comment on ACCUMULO-4777 at 1/8/18 2:14 PM:
--

Both time there were stack overflow errors.  The stack overflow is basically as 
follows (accumulo 1.8.1):

All in TabletServerLogger:

defineTablet line 465
write line 382
write line 356
defineTablet line 465
write line 382
write line 356
...



was (Author: ivan.bella):
The stack overflow is basically as follows (accumulo 1.8.1):

All in TabletServerLogger:

defineTablet line 465
write line 382
write line 356
defineTablet line 465
write line 382
write line 356
...


> Root tablet got spammed with 1.8 million log entries
> 
>
> Key: ACCUMULO-4777
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4777
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Ivan Bella
>Priority: Critical
>
> We had a tserver that was handling accumulo.metadata tablets that somehow got 
> into a loop where it created over 22K empty wal logs.  There were around 70 
> metadata tablets and this resulted in around 1.8 million log entries in added 
> to the accumulo.root table.  The only reason it stopped creating wal logs is 
> because it ran out of open file handles.  This took us many hours and cups of 
> coffee to clean up.
> The log contained the following messages in a tight loop:
> log.TabletServerLogger INFO : Using next log hdfs://...
> tserver.TabletServfer INFO : Writing log marker for hdfs://...
> tserver.TabletServer INFO : Marking hdfs://... closed
> log.DfsLogger INFO : Slow sync cost ...
> ...
> Unfortunately we did not have DEBUG turned on so we have no debug messages.
> Tracking through the code there are three places where the 
> TabletServerLogger.close method is called:
> 1) via resetLoggers in the TabletServerLogger, but nothing calls this method 
> so this is ruled out
> 2) when the log gets too large or too old, but neither of those checks should 
> have been hitting here.
> 3) In a loop that is executed (while (!success)) in the 
> TabletServerLogger.write method.  In this case when we unsuccessfullty write 
> something to the wal, then that one is closed and a new one is created.  This 
> loop will go forever until we successfully write out the entry.  A 
> DfsLogger.LogClosedException seems the most logical reason.  This is most 
> likely because a ClosedChannelException was thrown from the DfsLogger.write 
> methods (around line 609 in DfsLogger).
> So the root cause was most likely hadoop related.  However in accumulo we 
> probably should not be doing a tight retry loop around a hadoop failure.  I 
> recommend at a minimum doing some sort of exponential back off and perhaps 
> setting a limit on the number of retries resulting in a critical tserver 
> failure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (ACCUMULO-4777) Root tablet got spammed with 1.8 million log entries

2018-01-08 Thread Ivan Bella (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316337#comment-16316337
 ] 

Ivan Bella edited comment on ACCUMULO-4777 at 1/8/18 2:13 PM:
--

This happened to us again, however this time everything appeared to recover.  
This time we had debug on so we are analyzing the logs to try and determine how 
it gets into this state in the first place.


was (Author: ivan.bella):
This happened to us again, however this time everything appeared to recover .  
This time the loop appeared to terminate with a stack overflow error instead of 
running out of file descriptors first which may have allowed the tserver to 
remedy the situation earlier.  Also we had debug on so we are analyzing the 
logs to try and determine how it gets into this state in the first place.

> Root tablet got spammed with 1.8 million log entries
> 
>
> Key: ACCUMULO-4777
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4777
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Ivan Bella
>Priority: Critical
>
> We had a tserver that was handling accumulo.metadata tablets that somehow got 
> into a loop where it created over 22K empty wal logs.  There were around 70 
> metadata tablets and this resulted in around 1.8 million log entries in added 
> to the accumulo.root table.  The only reason it stopped creating wal logs is 
> because it ran out of open file handles.  This took us many hours and cups of 
> coffee to clean up.
> The log contained the following messages in a tight loop:
> log.TabletServerLogger INFO : Using next log hdfs://...
> tserver.TabletServfer INFO : Writing log marker for hdfs://...
> tserver.TabletServer INFO : Marking hdfs://... closed
> log.DfsLogger INFO : Slow sync cost ...
> ...
> Unfortunately we did not have DEBUG turned on so we have no debug messages.
> Tracking through the code there are three places where the 
> TabletServerLogger.close method is called:
> 1) via resetLoggers in the TabletServerLogger, but nothing calls this method 
> so this is ruled out
> 2) when the log gets too large or too old, but neither of those checks should 
> have been hitting here.
> 3) In a loop that is executed (while (!success)) in the 
> TabletServerLogger.write method.  In this case when we unsuccessfullty write 
> something to the wal, then that one is closed and a new one is created.  This 
> loop will go forever until we successfully write out the entry.  A 
> DfsLogger.LogClosedException seems the most logical reason.  This is most 
> likely because a ClosedChannelException was thrown from the DfsLogger.write 
> methods (around line 609 in DfsLogger).
> So the root cause was most likely hadoop related.  However in accumulo we 
> probably should not be doing a tight retry loop around a hadoop failure.  I 
> recommend at a minimum doing some sort of exponential back off and perhaps 
> setting a limit on the number of retries resulting in a critical tserver 
> failure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ACCUMULO-4777) Root tablet got spammed with 1.8 million log entries

2018-01-08 Thread Ivan Bella (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316356#comment-16316356
 ] 

Ivan Bella commented on ACCUMULO-4777:
--

The stack overflow is basically as follows (accumulo 1.8.1):

All in TabletServerLogger:

defineTablet line 465
write line 382
write line 356
defineTablet line 465
write line 382
write line 356
...


> Root tablet got spammed with 1.8 million log entries
> 
>
> Key: ACCUMULO-4777
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4777
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Ivan Bella
>Priority: Critical
>
> We had a tserver that was handling accumulo.metadata tablets that somehow got 
> into a loop where it created over 22K empty wal logs.  There were around 70 
> metadata tablets and this resulted in around 1.8 million log entries in added 
> to the accumulo.root table.  The only reason it stopped creating wal logs is 
> because it ran out of open file handles.  This took us many hours and cups of 
> coffee to clean up.
> The log contained the following messages in a tight loop:
> log.TabletServerLogger INFO : Using next log hdfs://...
> tserver.TabletServfer INFO : Writing log marker for hdfs://...
> tserver.TabletServer INFO : Marking hdfs://... closed
> log.DfsLogger INFO : Slow sync cost ...
> ...
> Unfortunately we did not have DEBUG turned on so we have no debug messages.
> Tracking through the code there are three places where the 
> TabletServerLogger.close method is called:
> 1) via resetLoggers in the TabletServerLogger, but nothing calls this method 
> so this is ruled out
> 2) when the log gets too large or too old, but neither of those checks should 
> have been hitting here.
> 3) In a loop that is executed (while (!success)) in the 
> TabletServerLogger.write method.  In this case when we unsuccessfullty write 
> something to the wal, then that one is closed and a new one is created.  This 
> loop will go forever until we successfully write out the entry.  A 
> DfsLogger.LogClosedException seems the most logical reason.  This is most 
> likely because a ClosedChannelException was thrown from the DfsLogger.write 
> methods (around line 609 in DfsLogger).
> So the root cause was most likely hadoop related.  However in accumulo we 
> probably should not be doing a tight retry loop around a hadoop failure.  I 
> recommend at a minimum doing some sort of exponential back off and perhaps 
> setting a limit on the number of retries resulting in a critical tserver 
> failure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ACCUMULO-4777) Root tablet got spammed with 1.8 million log entries

2018-01-08 Thread Ivan Bella (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316337#comment-16316337
 ] 

Ivan Bella commented on ACCUMULO-4777:
--

This happened to us again, however this time everything appeared to recover .  
This time the loop appeared to terminate with a stack overflow error instead of 
running out of file descriptors first which may have allowed the tserver to 
remedy the situation earlier.  Also we had debug on so we are analyzing the 
logs to try and determine how it gets into this state in the first place.

> Root tablet got spammed with 1.8 million log entries
> 
>
> Key: ACCUMULO-4777
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4777
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Ivan Bella
>Priority: Critical
>
> We had a tserver that was handling accumulo.metadata tablets that somehow got 
> into a loop where it created over 22K empty wal logs.  There were around 70 
> metadata tablets and this resulted in around 1.8 million log entries in added 
> to the accumulo.root table.  The only reason it stopped creating wal logs is 
> because it ran out of open file handles.  This took us many hours and cups of 
> coffee to clean up.
> The log contained the following messages in a tight loop:
> log.TabletServerLogger INFO : Using next log hdfs://...
> tserver.TabletServfer INFO : Writing log marker for hdfs://...
> tserver.TabletServer INFO : Marking hdfs://... closed
> log.DfsLogger INFO : Slow sync cost ...
> ...
> Unfortunately we did not have DEBUG turned on so we have no debug messages.
> Tracking through the code there are three places where the 
> TabletServerLogger.close method is called:
> 1) via resetLoggers in the TabletServerLogger, but nothing calls this method 
> so this is ruled out
> 2) when the log gets too large or too old, but neither of those checks should 
> have been hitting here.
> 3) In a loop that is executed (while (!success)) in the 
> TabletServerLogger.write method.  In this case when we unsuccessfullty write 
> something to the wal, then that one is closed and a new one is created.  This 
> loop will go forever until we successfully write out the entry.  A 
> DfsLogger.LogClosedException seems the most logical reason.  This is most 
> likely because a ClosedChannelException was thrown from the DfsLogger.write 
> methods (around line 609 in DfsLogger).
> So the root cause was most likely hadoop related.  However in accumulo we 
> probably should not be doing a tight retry loop around a hadoop failure.  I 
> recommend at a minimum doing some sort of exponential back off and perhaps 
> setting a limit on the number of retries resulting in a critical tserver 
> failure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ACCUMULO-4528) importtable and exporttable deserve descriptions in the user manual

2018-01-08 Thread Mark Owens (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Owens resolved ACCUMULO-4528.
--
Resolution: Fixed

> importtable and exporttable deserve descriptions in the user manual
> ---
>
> Key: ACCUMULO-4528
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4528
> Project: Accumulo
>  Issue Type: Task
>  Components: docs
>Reporter: Josh Elser
>Assignee: Mark Owens
>  Labels: newbie, pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The user manual has a section on exporttable in 
> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_exporting_tables 
> but this is just a pointer out to a readme file.
> We should really make this a proper chapter to avoid making users have two 
> hops to get to the documentation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)