[jira] [Commented] (ACCUMULO-1972) Range constructors call overridable method
[ https://issues.apache.org/jira/browse/ACCUMULO-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284450#comment-16284450 ] Christopher Tubbs commented on ACCUMULO-1972: - [~coffeethulhu], thanks, I have a few comments before applying your updated patch. * The constructor should be updated to reference the new {{beforeStartKeyImpl}} instead of {{beforeStartKey}}. That would fix the original problem, because the private method can't be overridden. * I don't think we need the new private method for {{afterEndKeyImpl}}, because that was never called in the constructor. * I think the original javadocs on the original public methods do not need to be changed. We want the javadocs on the public methods to reflect the user experience, not the internal implementation. The user does not need to know that it calls the private method. * If you strip trailing whitespace off of the lines in your patch, and run {{mvn clean package -DskipTests}}, your code will be formatted using our built-in code formatter during the build. I can also do this before applying the patch, but I wanted to let you know about it. Also, a hint for creating your patch: if you create a "git-formatted" patch (for example: {{git format-patch HEAD~1}}) to create a patch file, instead of {{git diff}}, the patch will include your authorship information, so you will get credit when we apply the patch. If it's more convenient, you can also submit a pull request against the 1.7 branch at https://github.com/apache/accumulo ; I can still give you credit if I create the git commit, but only by mentioning your name in the log message. Whatever you prefer is fine with me, but thought I'd mention it, so you had the opportunity to get credit in the git commit history of Accumulo. :) > Range constructors call overridable method > -- > > Key: ACCUMULO-1972 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1972 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.4.4, 1.5.0 >Reporter: Bill Havanki >Assignee: Matthew Dinep >Priority: Minor > Labels: newbie > Fix For: 1.7.4, 1.8.2, 2.0.0 > > Attachments: accumulo-1972.patch, accumulo-1972_2.patch > > > Several {{Range}} constructors call {{Range.beforeStartKey()}}, which is not > final. This is dangerous: > bq. The superclass constructor runs before the subclass constructor, so the > overriding method in the subclass will get invoked before the subclass > constructor has run. If the overriding method depends on any initialization > performed by the subclass constructor, the method will not behave as > expected. ??Item 17, Effective Java Vol. 2, Bloch?? > If {{beforeStartKey()}} cannot be made final, the code should be refactored > to make the constructors safe. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] mikewalch closed pull request #46: ACCUMULO-4752 Create documentation on improving performance
mikewalch closed pull request #46: ACCUMULO-4752 Create documentation on improving performance URL: https://github.com/apache/accumulo-website/pull/46 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/_docs-2-0/getting-started/design.md b/_docs-2-0/getting-started/design.md index 01c015eb..26e90484 100644 --- a/_docs-2-0/getting-started/design.md +++ b/_docs-2-0/getting-started/design.md @@ -107,10 +107,9 @@ ingest and query load is balanced across the cluster. When a write arrives at a TabletServer it is written to a Write-Ahead Log and then inserted into a sorted data structure in memory called a MemTable. When the MemTable reaches a certain size, the TabletServer writes out the sorted -key-value pairs to a file in HDFS called a Relative Key File (RFile), which is a -kind of Indexed Sequential Access Method (ISAM) file. This process is called a -minor compaction. A new MemTable is then created and the fact of the compaction -is recorded in the Write-Ahead Log. +key-value pairs to a file in HDFS called an [RFile](#rfile)). This process is +called a minor compaction. A new MemTable is then created and the fact of the +compaction is recorded in the Write-Ahead Log. When a request to read data arrives at a TabletServer, the TabletServer does a binary search across the MemTable as well as the in-memory indexes associated @@ -118,6 +117,18 @@ with each RFile to find the relevant values. If clients are performing a scan, several key-value pairs are returned to the client in order from the MemTable and the set of RFiles by performing a merge-sort as they are read. +## RFile + +RFile (short for Relative Key File) is a file that contains Accumulo's sorted key-value +pairs. The file is written to HDFS by Tablet Servers during a minor compaction. RFiles are +organized using the Index Sequential Access Method (ISAM). RFiles consist of data (key/value) block, +index blocks (which are used to find data block), and meta blocks (which contain +metadata for bloom filters and summary statistics). Data in an RFile is seperated by +locality group. The diagram below shows the logical view and HDFS file view of an RFile. + +![rfile diagram]({{ site.url }}/images/docs/rfile_diagram.png) + + ## Compactions In order to manage the number of files per tablet, periodically the TabletServer @@ -167,4 +178,3 @@ TabletServer failures are noted on the Master's monitor page, accessible via [clients]: {{page.docs_baseurl}}/getting-started/clients [merging]: {{page.docs_baseurl}}/getting-started/table_configuration#merging-tablets [compaction]: {{page.docs_baseurl}}/getting-started/table_configuration#compaction - diff --git a/_docs-2-0/troubleshooting/performance.md b/_docs-2-0/troubleshooting/performance.md new file mode 100644 index ..f6dd705e --- /dev/null +++ b/_docs-2-0/troubleshooting/performance.md @@ -0,0 +1,52 @@ +--- +title: Performance +category: troubleshooting +order: 5 +--- + +Accumulo can be tuned to improve read and write performance. + +## Read performance + +1. Enable [caching] on tables to reduce reads to disk. + +1. Enable [bloom filters][bloom-filters] on tables to limit the number of disk lookups. + +1. Decrease the [major compaction ratio][compaction] of a table to decrease the number of + files per tablet. Less files reduces the latency of reads. + +1. Decrease the size of [data blocks in RFiles][rfile] by lowering [table.file.compress.blocksize] which can result + in better random seek performance. However, this can increase the size of indexes in the RFile. If the indexes + are too large to fit in cache, this can hinder performance. Also, as the index size increases the depth of the + index tree in each file may increase. Increasing [table.file.compress.blocksize.index] can reduce the depth of + the tree. + +## Write performance + +1. Enable [native maps][native-maps] on tablet servers to prevent Java garbage collection pauses + which can slow ingest. + +1. [Pre-split new tables][split] to distribute writes across multiple tablet servers. + +1. Ingest data using [multiple clients][multi-client] or [bulk ingest][bulk] to increase ingest throughput. + +1. Increase the [major compaction ratio][compaction] of a table to limit the number of major compactions + which improves ingest performance. + +1. On large Accumulo clusters, use [multiple HDFS volumes][multivolume] to increase write performance. + +1. Change the compression format used by [blocks in RFiles][rfile] by setting [table.file.compress.type] to + `snappy`. This increases write speed at the expense of using more disk space. + +[caching]: {{ page.docs_baseurl }}/administration/caching +[bloom-filters]: {{ page.docs_baseurl
[jira] [Commented] (ACCUMULO-4702) Use of Beta or deprecated Guava methods
[ https://issues.apache.org/jira/browse/ACCUMULO-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284244#comment-16284244 ] Michael Miller commented on ACCUMULO-4702: -- I built the [library-detectors|https://github.com/overstock/library-detectors] findbugs plugin locally (they updated to 1.2 but haven't released to maven central yet) and configured it to skip all the Beta classes. This was basically a test to create a list of all the Beta classes remaining in Accumulo 2.0. Here is what I came up: {code:xml} -Dcom.overstock.findbugs.ignore= com.google.common.util.concurrent.RateLimiter, com.google.common.hash.Hasher, com.google.common.hash.HashCode, com.google.common.hash.HashFunction, com.google.common.hash.Hashing, com.google.common.cache.Cache, com.google.common.io.CountingOutputStream, com.google.common.io.ByteStreams, com.google.common.cache.LoadingCache, com.google.common.base.Stopwatch, com.google.common.cache.RemovalNotification, com.google.common.util.concurrent.Uninterruptibles, com.google.common.reflect.ClassPath, com.google.common.reflect.ClassPath$ClassInfo, com.google.common.base.Throwables {code} > Use of Beta or deprecated Guava methods > --- > > Key: ACCUMULO-4702 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4702 > Project: Accumulo > Issue Type: Bug >Reporter: Michael Miller >Assignee: Michael Miller > Labels: pull-request-available > Fix For: 1.7.4, 1.8.2, 2.0.0 > > Time Spent: 6h > Remaining Estimate: 0h > > Google releases Guava with beta and deprecated code that should not be used. > From Guava README: > bq. If your code is a library itself (i.e. it is used on the CLASSPATH of > users outside your own control), you should not use beta API -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ACCUMULO-4760) NPE on server side getting replication information for monitor
Christopher Tubbs created ACCUMULO-4760: --- Summary: NPE on server side getting replication information for monitor Key: ACCUMULO-4760 URL: https://issues.apache.org/jira/browse/ACCUMULO-4760 Project: Accumulo Issue Type: Bug Components: monitor Reporter: Christopher Tubbs Fix For: 2.0.0 Without actually setting up replication, I tried looking at http://localhost:9995/replication in the monitor with the replication table both online and offline. Both seemed to have the same problem, resulting in a server-side HTTP 500 error on http://localhost:9995/rest/replication {code:java} 017-12-08 14:34:21,661 [servlet.ServletHandler] WARN : javax.servlet.ServletException: java.lang.NullPointerException at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.accumulo.core.client.impl.Tables.getZooCache(Tables.java:52) at org.apache.accumulo.core.client.impl.Tables.getAllTables(Tables.java:238) at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:279) at org.apache.accumulo.monitor.rest.replication.ReplicationResource.getReplicationInformation(ReplicationResource.java:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) at
[GitHub] mikewalch commented on issue #46: ACCUMULO-4752 Create documentation on improving performance
mikewalch commented on issue #46: ACCUMULO-4752 Create documentation on improving performance URL: https://github.com/apache/accumulo-website/pull/46#issuecomment-350343600 @keith-turner, I updated the picture. Let me know if that works. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] keith-turner commented on issue #46: ACCUMULO-4752 Create documentation on improving performance
keith-turner commented on issue #46: ACCUMULO-4752 Create documentation on improving performance URL: https://github.com/apache/accumulo-website/pull/46#issuecomment-350333161 @mikewalch the picture looks great. A metablock contains all of the the root nodes for each locality group. So this is slightly off in the picture. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Accumulo-Pull-Requests - Build # 885 - Fixed
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #885) Status: Fixed Check console output at https://builds.apache.org/job/Accumulo-Pull-Requests/885/ to view the results.
[GitHub] milleruntime commented on issue #334: ACCUMULO-4758 throw correct exception when meta block absent
milleruntime commented on issue #334: ACCUMULO-4758 throw correct exception when meta block absent URL: https://github.com/apache/accumulo/pull/334#issuecomment-350325583 My approval was merely a thank you for the information so thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] asfgit commented on issue #334: ACCUMULO-4758 throw correct exception when meta block absent
asfgit commented on issue #334: ACCUMULO-4758 throw correct exception when meta block absent URL: https://github.com/apache/accumulo/pull/334#issuecomment-350324100 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (ACCUMULO-4758) Summaries command fails on ShellServerIT
[ https://issues.apache.org/jira/browse/ACCUMULO-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Turner updated ACCUMULO-4758: --- Affects Version/s: (was: 2.0.0) > Summaries command fails on ShellServerIT > > > Key: ACCUMULO-4758 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4758 > Project: Accumulo > Issue Type: Bug >Reporter: Michael Miller >Assignee: Keith Turner > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The ShellServerIT.testSummarySelection has been failing recently. I believe > its due to a recent change from ACCUMULO-4641. It appears to be a thread > concurrency issue in the CachableBlockFile. Here is the stack trace: > {code:java} > 2017-12-07 18:30:47,617 [thrift.ProcessFunction] ERROR: Internal error > processing startGetSummariesFromFiles > org.apache.thrift.TException: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.io.UncheckedIOException: > org.apache.accumulo.core.file.rfile.bcfile.MetaBlockDoesNotExist: > name=accumulo.summaries.index > at > org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:63) > at com.sun.proxy.$Proxy21.startGetSummariesFromFiles(Unknown Source) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startGetSummariesFromFiles.getResult(TabletClientService.java:3335) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startGetSummariesFromFiles.getResult(TabletClientService.java:3319) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63) > at > org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) > at > org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:106) > at org.apache.thrift.server.Invocation.run(Invocation.java:18) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.io.UncheckedIOException: > org.apache.accumulo.core.file.rfile.bcfile.MetaBlockDoesNotExist: > name=accumulo.summaries.index > at > org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.getSummaries(TabletServer.java:1822) > at > org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startSummaryOperation(TabletServer.java:1834) > at > org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startGetSummariesFromFiles(TabletServer.java:1898) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46) > at > org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:60) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ACCUMULO-4758) Summaries command fails on ShellServerIT
[ https://issues.apache.org/jira/browse/ACCUMULO-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Turner reassigned ACCUMULO-4758: -- Assignee: Keith Turner > Summaries command fails on ShellServerIT > > > Key: ACCUMULO-4758 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4758 > Project: Accumulo > Issue Type: Bug >Reporter: Michael Miller >Assignee: Keith Turner > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The ShellServerIT.testSummarySelection has been failing recently. I believe > its due to a recent change from ACCUMULO-4641. It appears to be a thread > concurrency issue in the CachableBlockFile. Here is the stack trace: > {code:java} > 2017-12-07 18:30:47,617 [thrift.ProcessFunction] ERROR: Internal error > processing startGetSummariesFromFiles > org.apache.thrift.TException: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.io.UncheckedIOException: > org.apache.accumulo.core.file.rfile.bcfile.MetaBlockDoesNotExist: > name=accumulo.summaries.index > at > org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:63) > at com.sun.proxy.$Proxy21.startGetSummariesFromFiles(Unknown Source) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startGetSummariesFromFiles.getResult(TabletClientService.java:3335) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startGetSummariesFromFiles.getResult(TabletClientService.java:3319) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63) > at > org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) > at > org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:106) > at org.apache.thrift.server.Invocation.run(Invocation.java:18) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.io.UncheckedIOException: > org.apache.accumulo.core.file.rfile.bcfile.MetaBlockDoesNotExist: > name=accumulo.summaries.index > at > org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.getSummaries(TabletServer.java:1822) > at > org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startSummaryOperation(TabletServer.java:1834) > at > org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startGetSummariesFromFiles(TabletServer.java:1898) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46) > at > org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:60) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] keith-turner opened a new pull request #334: ACCUMULO-4758 throw correct exception when meta block absent
keith-turner opened a new pull request #334: ACCUMULO-4758 throw correct exception when meta block absent URL: https://github.com/apache/accumulo/pull/334 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (ACCUMULO-4758) Summaries command fails on ShellServerIT
[ https://issues.apache.org/jira/browse/ACCUMULO-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ACCUMULO-4758: - Labels: pull-request-available (was: ) > Summaries command fails on ShellServerIT > > > Key: ACCUMULO-4758 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4758 > Project: Accumulo > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Michael Miller > Labels: pull-request-available > > The ShellServerIT.testSummarySelection has been failing recently. I believe > its due to a recent change from ACCUMULO-4641. It appears to be a thread > concurrency issue in the CachableBlockFile. Here is the stack trace: > {code:java} > 2017-12-07 18:30:47,617 [thrift.ProcessFunction] ERROR: Internal error > processing startGetSummariesFromFiles > org.apache.thrift.TException: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.io.UncheckedIOException: > org.apache.accumulo.core.file.rfile.bcfile.MetaBlockDoesNotExist: > name=accumulo.summaries.index > at > org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:63) > at com.sun.proxy.$Proxy21.startGetSummariesFromFiles(Unknown Source) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startGetSummariesFromFiles.getResult(TabletClientService.java:3335) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startGetSummariesFromFiles.getResult(TabletClientService.java:3319) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63) > at > org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) > at > org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:106) > at org.apache.thrift.server.Invocation.run(Invocation.java:18) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.io.UncheckedIOException: > org.apache.accumulo.core.file.rfile.bcfile.MetaBlockDoesNotExist: > name=accumulo.summaries.index > at > org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.getSummaries(TabletServer.java:1822) > at > org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startSummaryOperation(TabletServer.java:1834) > at > org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startGetSummariesFromFiles(TabletServer.java:1898) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46) > at > org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:60) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] milleruntime commented on a change in pull request #48: Added conditional writer to tour
milleruntime commented on a change in pull request #48: Added conditional writer to tour URL: https://github.com/apache/accumulo-website/pull/48#discussion_r155801360 ## File path: tour/conditional-writer.md ## @@ -1,3 +1,119 @@ --- title: Conditional Writer --- + +When read-modify-write operations run concurrently, its possible changes made Review comment: > its possible changes made Should be it's or it is. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] milleruntime commented on a change in pull request #48: Added conditional writer to tour
milleruntime commented on a change in pull request #48: Added conditional writer to tour URL: https://github.com/apache/accumulo-website/pull/48#discussion_r155819409 ## File path: tour/conditional-writer.md ## @@ -1,3 +1,119 @@ --- title: Conditional Writer --- + +When read-modify-write operations run concurrently, its possible changes made +by some operations will be overwritten by others. The following sequence of +events shows an example of this. + + 1. Thread 0 sets the key `id0001:location:home` to `1007 Mountain Dr, Gotham, New York` + 2. Thread 1 reads `id0001:location:home` + 3. Thread 2 reads `id0001:location:home` + 4. Thread 1 replaces `Dr` with `Drive` + 5. Thread 2 replaces `New York` with `NY` + 6. Thread 1 sets key `id0001:location:home` to `1007 Mountain Drive, Gotham, New York` + 7. Thread 2 sets key `id0001:location:home` to `1007 Mountain Dr, Gotham, NY` + +In this situation the changes made by Thread 1 are lost, ending up with `1007 +Mountain Dr, Gotham, NY` instead of `1007 Mountain Drive, Gotham, NY`. To +correctly handle this, Accumulo offers the [ConditionalWriter]. The +ConditionalWriter atomically checks conditions on a row and only applies a +mutation when all are satisfied. + +## Exercise + +The following code simulates the concurrency situation above. Because it uses +a BatchWriter it will lose modifications. + +```java + static String getAddress(Connector conn, String id) { +// The IsolatedScanner ensures partial changes to a row are not seen +try (Scanner scanner = new IsolatedScanner(conn.createScanner("GothamPD", Authorizations.EMPTY))) { + scanner.setRange(Range.exact(id, "location", "home")); + for (Entryentry : scanner) { +return entry.getValue().toString(); + } + return null; +} catch (TableNotFoundException e) { + throw new RuntimeException(e); +} + } + + static boolean setAddress(Connector conn, String id, String expectedAddr, String newAddr) { +try (BatchWriter writer = conn.createBatchWriter("GothamPD", new BatchWriterConfig())) { + Mutation mutation = new Mutation(id); + mutation.put("location", "home", newAddr); + writer.addMutation(mutation); + return true; +} catch (Exception e) { + throw new RuntimeException(e); +} + } + + public static Future modifyAddress(Connector conn, String id, Function modifier) { +return CompletableFuture.runAsync(() -> { + String currAddr, newAddr; + do { +currAddr = getAddress(conn, id); +newAddr = modifier.apply(currAddr); +System.out.printf("Thread %3d attempting change %20s -> %-20s\n", +Thread.currentThread().getId(), "'"+currAddr+"'", "'"+newAddr+"'"); + } while (!setAddress(conn, id, currAddr, newAddr)); +}); + } + + static void exercise(MiniAccumuloCluster mac) throws Exception { +Connector conn = mac.getConnector("root", "tourguide"); +conn.tableOperations().create("GothamPD"); + +String id = "id0001"; + +setAddress(conn, id, null, " 1007 Mountain Dr, Gotham, New York "); + +// create async operation to trim whitespace +Future future1 = modifyAddress(conn, id, String::trim); + +// create async operation to replace Dr with Drive +Future future2 = modifyAddress(conn, id, addr -> addr.replace("Dr", "Drive")); + +// create async operation to replace New York with NY +Future future3 = modifyAddress(conn, id, addr -> addr.replace("New York", "NY")); + +// wait for async operations to complete +future1.get(); +future2.get(); +future3.get(); + +// print the address stored in Accumulo +System.out.println("Final address : '"+getAddress(conn, id)+"'"); + } +``` + +The following is one of a few possible outputs. Notice that only the +modification of `New York` to `NY` shows up in the final output. The other +modifications were lost. + +``` +Thread 38 attempting change ' 1007 Mountain Dr, Gotham, New York ' -> ' 1007 Mountain Drive, Gotham, New York ' +Thread 39 attempting change ' 1007 Mountain Dr, Gotham, New York ' -> ' 1007 Mountain Dr, Gotham, NY ' +Thread 37 attempting change ' 1007 Mountain Dr, Gotham, New York ' -> '1007 Mountain Dr, Gotham, New York' +Final address : ' 1007 Mountain Dr, Gotham, NY ' +``` + +To fix this example, make the following changes in `setAddress()` to use a +ConditionalWriter. + + * Call [createConditionalWriter] instead of creating a batch writer + * Create a [Condition] for the column 'location:home'. If `expectedAddr` is not null, then pass it to [setValue]. A condition with no value set means that column is expected to absent. Review comment: An example on how to create a Condition would be helpful. I found this tricky never having used it. I kept looking for a method to set the logic vs explicitly performing the checks in java.
[GitHub] milleruntime commented on a change in pull request #48: Added conditional writer to tour
milleruntime commented on a change in pull request #48: Added conditional writer to tour URL: https://github.com/apache/accumulo-website/pull/48#discussion_r155821948 ## File path: tour/conditional-writer.md ## @@ -1,3 +1,119 @@ --- title: Conditional Writer --- + +When read-modify-write operations run concurrently, its possible changes made +by some operations will be overwritten by others. The following sequence of +events shows an example of this. + + 1. Thread 0 sets the key `id0001:location:home` to `1007 Mountain Dr, Gotham, New York` + 2. Thread 1 reads `id0001:location:home` + 3. Thread 2 reads `id0001:location:home` + 4. Thread 1 replaces `Dr` with `Drive` + 5. Thread 2 replaces `New York` with `NY` + 6. Thread 1 sets key `id0001:location:home` to `1007 Mountain Drive, Gotham, New York` + 7. Thread 2 sets key `id0001:location:home` to `1007 Mountain Dr, Gotham, NY` + +In this situation the changes made by Thread 1 are lost, ending up with `1007 +Mountain Dr, Gotham, NY` instead of `1007 Mountain Drive, Gotham, NY`. To +correctly handle this, Accumulo offers the [ConditionalWriter]. The +ConditionalWriter atomically checks conditions on a row and only applies a +mutation when all are satisfied. + +## Exercise + +The following code simulates the concurrency situation above. Because it uses +a BatchWriter it will lose modifications. + +```java + static String getAddress(Connector conn, String id) { +// The IsolatedScanner ensures partial changes to a row are not seen +try (Scanner scanner = new IsolatedScanner(conn.createScanner("GothamPD", Authorizations.EMPTY))) { + scanner.setRange(Range.exact(id, "location", "home")); + for (Entryentry : scanner) { +return entry.getValue().toString(); + } + return null; +} catch (TableNotFoundException e) { + throw new RuntimeException(e); +} + } + + static boolean setAddress(Connector conn, String id, String expectedAddr, String newAddr) { +try (BatchWriter writer = conn.createBatchWriter("GothamPD", new BatchWriterConfig())) { + Mutation mutation = new Mutation(id); + mutation.put("location", "home", newAddr); + writer.addMutation(mutation); + return true; +} catch (Exception e) { + throw new RuntimeException(e); +} + } + + public static Future modifyAddress(Connector conn, String id, Function modifier) { +return CompletableFuture.runAsync(() -> { + String currAddr, newAddr; + do { +currAddr = getAddress(conn, id); +newAddr = modifier.apply(currAddr); +System.out.printf("Thread %3d attempting change %20s -> %-20s\n", +Thread.currentThread().getId(), "'"+currAddr+"'", "'"+newAddr+"'"); + } while (!setAddress(conn, id, currAddr, newAddr)); +}); + } + + static void exercise(MiniAccumuloCluster mac) throws Exception { +Connector conn = mac.getConnector("root", "tourguide"); +conn.tableOperations().create("GothamPD"); + +String id = "id0001"; + +setAddress(conn, id, null, " 1007 Mountain Dr, Gotham, New York "); + +// create async operation to trim whitespace +Future future1 = modifyAddress(conn, id, String::trim); + +// create async operation to replace Dr with Drive +Future future2 = modifyAddress(conn, id, addr -> addr.replace("Dr", "Drive")); + +// create async operation to replace New York with NY +Future future3 = modifyAddress(conn, id, addr -> addr.replace("New York", "NY")); + +// wait for async operations to complete +future1.get(); +future2.get(); +future3.get(); + +// print the address stored in Accumulo +System.out.println("Final address : '"+getAddress(conn, id)+"'"); + } +``` + +The following is one of a few possible outputs. Notice that only the +modification of `New York` to `NY` shows up in the final output. The other +modifications were lost. + +``` +Thread 38 attempting change ' 1007 Mountain Dr, Gotham, New York ' -> ' 1007 Mountain Drive, Gotham, New York ' +Thread 39 attempting change ' 1007 Mountain Dr, Gotham, New York ' -> ' 1007 Mountain Dr, Gotham, NY ' +Thread 37 attempting change ' 1007 Mountain Dr, Gotham, New York ' -> '1007 Mountain Dr, Gotham, New York' +Final address : ' 1007 Mountain Dr, Gotham, NY ' +``` + +To fix this example, make the following changes in `setAddress()` to use a +ConditionalWriter. + + * Call [createConditionalWriter] instead of creating a batch writer + * Create a [Condition] for the column 'location:home'. If `expectedAddr` is not null, then pass it to [setValue]. A condition with no value set means that column is expected to absent. + * Replace Mutation with a [ConditionalMutation] and set the condition. + * Call [write] passing it the conditional mutation. + * Return `true` if [getStatus] from the [Result] returned by [write] is `ACCEPTED`. Review comment: Referring to the enum here would be
[GitHub] milleruntime commented on a change in pull request #48: Added conditional writer to tour
milleruntime commented on a change in pull request #48: Added conditional writer to tour URL: https://github.com/apache/accumulo-website/pull/48#discussion_r155803153 ## File path: tour/conditional-writer.md ## @@ -1,3 +1,119 @@ --- title: Conditional Writer --- + +When read-modify-write operations run concurrently, its possible changes made +by some operations will be overwritten by others. The following sequence of +events shows an example of this. + + 1. Thread 0 sets the key `id0001:location:home` to `1007 Mountain Dr, Gotham, New York` + 2. Thread 1 reads `id0001:location:home` + 3. Thread 2 reads `id0001:location:home` + 4. Thread 1 replaces `Dr` with `Drive` + 5. Thread 2 replaces `New York` with `NY` + 6. Thread 1 sets key `id0001:location:home` to `1007 Mountain Drive, Gotham, New York` + 7. Thread 2 sets key `id0001:location:home` to `1007 Mountain Dr, Gotham, NY` + +In this situation the changes made by Thread 1 are lost, ending up with `1007 +Mountain Dr, Gotham, NY` instead of `1007 Mountain Drive, Gotham, NY`. To +correctly handle this, Accumulo offers the [ConditionalWriter]. The +ConditionalWriter atomically checks conditions on a row and only applies a +mutation when all are satisfied. + +## Exercise + +The following code simulates the concurrency situation above. Because it uses +a BatchWriter it will lose modifications. Review comment: I think the last sentence could be more clear. Maybe something like: "The BatchWriter is used to write to the same row across 3 separate threads and there is nothing preventing concurrent modifications." A bit wordy but explains what is going on. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mikewalch commented on a change in pull request #48: Added conditional writer to tour
mikewalch commented on a change in pull request #48: Added conditional writer to tour URL: https://github.com/apache/accumulo-website/pull/48#discussion_r155815828 ## File path: tour/conditional-writer.md ## @@ -1,3 +1,119 @@ --- title: Conditional Writer --- + +When read-modify-write operations run concurrently, its possible changes made +by some operations will be overwritten by others. The following sequence of +events shows an example of this. + + 1. Thread 0 sets the key `id0001:location:home` to `1007 Mountain Dr, Gotham, New York` Review comment: To add to this, I just think more information needs to be given to set up the exercise so the focus is understanding conditional mutations and not trying to figure out goal of application. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mikewalch commented on a change in pull request #48: Added conditional writer to tour
mikewalch commented on a change in pull request #48: Added conditional writer to tour URL: https://github.com/apache/accumulo-website/pull/48#discussion_r155803132 ## File path: tour/conditional-writer.md ## @@ -1,3 +1,119 @@ --- title: Conditional Writer --- + +When read-modify-write operations run concurrently, its possible changes made +by some operations will be overwritten by others. The following sequence of +events shows an example of this. + + 1. Thread 0 sets the key `id0001:location:home` to `1007 Mountain Dr, Gotham, New York` Review comment: I think it would be easier to understand this example if the original address was `1007 Mountain Drive, Gotham, New York` and the threads are standardizing the address. * Thread 1 replaces `Drive` with `Dr` * Thread 2 replaces `New York` with `NY` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] mikewalch commented on a change in pull request #48: Added conditional writer to tour
mikewalch commented on a change in pull request #48: Added conditional writer to tour URL: https://github.com/apache/accumulo-website/pull/48#discussion_r155815228 ## File path: tour/conditional-writer.md ## @@ -1,3 +1,119 @@ --- title: Conditional Writer --- + +When read-modify-write operations run concurrently, its possible changes made +by some operations will be overwritten by others. The following sequence of +events shows an example of this. + + 1. Thread 0 sets the key `id0001:location:home` to `1007 Mountain Dr, Gotham, New York` + 2. Thread 1 reads `id0001:location:home` + 3. Thread 2 reads `id0001:location:home` + 4. Thread 1 replaces `Dr` with `Drive` + 5. Thread 2 replaces `New York` with `NY` + 6. Thread 1 sets key `id0001:location:home` to `1007 Mountain Drive, Gotham, New York` + 7. Thread 2 sets key `id0001:location:home` to `1007 Mountain Dr, Gotham, NY` + +In this situation the changes made by Thread 1 are lost, ending up with `1007 +Mountain Dr, Gotham, NY` instead of `1007 Mountain Drive, Gotham, NY`. To +correctly handle this, Accumulo offers the [ConditionalWriter]. The +ConditionalWriter atomically checks conditions on a row and only applies a +mutation when all are satisfied. + +## Exercise + +The following code simulates the concurrency situation above. Because it uses +a BatchWriter it will lose modifications. + +```java + static String getAddress(Connector conn, String id) { Review comment: More imports are needed at top of file. We could move to wild card imports. Below worked for me. However, this should be tested with other exercises. ```java // Classes you will use along the tour import java.util.*; import java.nio.file.*; import java.util.concurrent.*; import java.util.function.*; import org.apache.accumulo.core.client.*; import org.apache.accumulo.core.client.Scanner; import org.apache.accumulo.core.data.*; import org.apache.accumulo.core.security.*; import org.apache.accumulo.minicluster.MiniAccumuloCluster; import org.apache.hadoop.io.Text; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services